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СНАРТЕВ 1 


Notational conventions 


Contents 


* Notational conventions 
— Introduction 
— Bit operations 


— Sign extension 


— Bitfield extraction 


1.1 Introduction 


Semantics of many operations are described in pseudocode. Here are some often used primitives. 


1.2 Bit operations 


In many places, the GPUs allow specifying arbitrary X-input boolean or bitwise operations, where X is 2, 3, or 4. They 
are described by a 2х «X-bit mask selecting the bit combinations for which the output should be true. For example, 
2-input operation 0х4 (050100) is -у1 & v2: only bit 2 (0610) is set, so the only input combination (0, 1) results 
in a true output. Likewise, 3-input operation Охаа (0b10101010) is simply a passthrough of first input: the bits set in 
the mask are 1, 3, 5, 7 (005001, 00011, 05101, 00111), which corresponds exactly to the input combinations 
which have the first input equal to 1. 


The exact semantics of such operations are: 


# single-bit version 
def bitop single(op, xinputs): 
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# first, 
bitidx = 0 
for idx, input in enumerate (inputs): 
if input: 
bitidx |- 
# second, 
return op >> bitidx & 1 


1 << idx 


def bitop (ор, *inputs): 
max len = max(input.bit length() 
res = 0 


# perform bitop single operation on each bit 
for x in range(max len + 1): 
res |= bitop single(op, 


*(input >> x & 1 for input in inputs)) 


construct mask bit index from the inputs 


the result is the given bit of the mask 


for input in inputs) 


(+ 1 for sign bit) 


<< x 


# all bits starting from max len will be identical - just what sext does 


return sext (res, max len) 


As further example, the 2-input operations on a, b are: 
* 0x0: always 0 


e 0х1: ~а & ~b 


e 0x2:a & ~b 
* 0x3: 
* 0х4: 
* 0х5: 
e 0x6:a ^ b 
0х7: 
e 0х8:а & р 
* 0х9: 
e Оха: а 

* Oxb: a | 
* Oxc: b 

* Oxd: 
e Oxe:a | b 


* Oxf: always 1 


For further enlightenment, you can search for GDI raster operations, which correspond to 3-input bit operations. 


1.3 Sign extension 


An often used primitive is sign extension from a given bit. This operation is known as sext after xtensa instruction 


of the same name and is formally defined as follows: 
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def sext(val, bit): 

# mask with all bits up from #bit set 

mask = -1 << bit 

if val & 1 «« bit: 
# sign bit set, negative, set all upper bits 
return val | mask 

else: 
$ sign bit not set, positive, clear all upper bits 
return val & -mask 


1.4 Bitfield extraction 


Another often used primitive is bitfield extraction. Extracting an unsigned bitfield of length 1 starting at position s in 
val is denoted by extr (val, s, 1),andsignedone by extrs (val, s, 1): 


def extr(val, s, 1): 
return val >> s & ((1 << 1) - 1) 


def extrs(val, s, 1): 
return sext(extrs(val, s, 1), l - 1) 
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СНАРТЕВ 2 


nVidia hardware documentation 


Contents: 


2.1 nVidia GPU introduction 


Contents 


* nVidia GPU introduction 
— Introduction 


— Card schematic 


— GPU schematic - NV3:G80 
— GPU schematic - G80:GF100 
— GPU schematic - GF100- 


2.1.1 Introduction 


This file is a short introduction to nvidia GPUs and graphics cards. Note that the schematics shown here are simplified 
and do not take all details into account - consult specific unit documentation when needed. 


2.1.2 Card schematic 


An nvidia-based graphics card is made of a main GPU chip and many supporting chips. Note that the following 
schematic attempts to show as many chips as possible - not all of them are included on all cards. 
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Ж------ + memory bus %---------4 analog video.  Ж-------4 + 
| VRAM |-----------| 00000) |---------------- 
#====== * I2C bus VGA 
%-------------- + %-------4 + 
| PCI/AGP/PCIE |--------- 
рис ыы шшш + TMDS video %-------4 + 
queo * parallel analog video 
| BIOS ROM |---------- -----------------| DVI-I 
фчезез-етет * or SPI I2C bus + GPIO 
qued * I2C bus dme + 
| HDCP ROM |---------- 
ta + videolink out peccALnze———- + 
шинэлнэ xternal deese 
%----------- + VID GPIO I2C bus TV --| TV | 
| woltage |----------| ^^ ee9———————---5---- encoder Ф----ФЖ 
| regulator | GPU еее E, + 
Ж----------- + 
| I2C bus 
Mec videolink in-«out +----- + 
| = 2... dYeseseeesseessseseee | SLI | 
peccoL-cesco-cses- + GPIOS === + 
| thermal | ALERT 
| monitoring  |-------- ITU-R-656 %------------ + 
| *tan control |- -GRIO | -. [==========—= | |^ ======= + 
Жез-есезе----те- F I2C bus | TV decoder |--| TV in | 
в ЦЦ/ л ашниэсэн | | ж======= + 
| %------------ + 
%----- * FAN GPIO 
| fan |------------- media port 4--------------4 
peceees рр [eese | MPEG decoder 
рым еш = з, 
%------- + HDMI bypass 
| SPDIF |-------------- %---------------------- + 
+—===—®=— + audio input | ГГтт- | configuration straps | 
шиг + 
Note: while this schematic shows a TV output using an external encoder chip, newer cards have an internal TV 


encoder and can connect the output directly to the GPU. Also, external encoders are not limitted to TV outputs - 
they're also used for TMDS, DisplayPort and LVDS outputs on some cards. 


Note: in many cases, I2C buses can be shared between various devices even when not shown by the above schema. 


In summary, a card contains: 


* a GPU chip [see GPU chips for a list] 


* a PCI, AGP, or PCI-Express host interface 


* on-board GPU memory [aka УКАМ] - depending on GPU, various memory types can be supported: УКАМ, 
EDO, SGRAM, SDR, DDR, DDR2, GDDR3, DDR3, GDDR5. 


* a parallel or SPI-connected flash ROM containing the video BIOS. The BIOS image, in addition to standard 
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VGA BIOS code, contains information about the devices and connectors present on the card and scripts to boot 
up and manage devices on the card. 


configuration straps - a set of resistors used to configure various functions of the card that need to be up before 
the card is POSTed. 


a small I2C EEPROM with encrypted HDCP keys [optional, some G84:GT215, now discontinued in favor of 
storing the keys in fuses on the GPU] 


a voltage regulator [starting with NV10 [?] family] - starting with roughly NV30 family, the target voltage 
can be set via GPIO pins on the GPU. The voltage regulator may also have “power good” and “emergency 
shutdown” signals connected to the GPU via GPIOs. In some rare cases, particularly on high-end cards, the 
voltage regulator may also be accessible via I2C. 


optionally [usually on high-end cards], a thermal monitoring chip accessible via I2C, to supplement/replace the 
bultin thermal sensor of the GPU. May or may not include autonomous fan control and fan speed measurement 
capability. Usually has a “thermal alert" pin connected to a GPIO. 


a fan - control and speed measurement done either by the thermal monitoring chip, or by the GPU via GPIOs. 


SPDIF input [rare, some G84:GT215] - used for audio bypass to HDMI-capable TMDS outputs, newer GPUs 
include a builtin audio codec instead. 


on-chip video outputs - video output connectors connected directly to the GPU. Supported output types depend 
on the GPU and include VGA, TV [composite, S-Video, or component], TMDS [ie. the protocol used in DVI 
digital and HDMI], FPD-Link [aka LVDS], DisplayPort. 


external output encoders - usually found with older GPUs which don't support TV, TMDS or FPD-Link outputs 
directly. The encoder is connected to the GPU via a parallel data bus [*videolink"] and a controlling I2C bus. 


SLI connectors [optional, newer high-end cards only] - video links used to transmit video to display from slave 
cards in SLI configuration to the master. Uses the same circuitry as outputs to external output encoders. 


TV decoder chip [sometimes with a tuner] connected to the capture port of the GPU and to an I2C bus - rare, on 
old cards only 


external MPEG decoder chip connected to so-called mediaport on the GPU - alleged to exist on some 
NV3/NVA/NV 10 cards, but never seen in the wild 


In addition to normal cards, nvidia GPUs may be found integrated on motherboards - in this case they're often missing 
own BIOS and HDCP ROMs, instead having them intergrated with the main system ROM. There are also IGPs 
[Integrated Graphics Processors], which are a special variant of GPU integrated into the main system chipset. They 
don't have on-board memory or memory controller, sharing the main system RAM instead. 


2.1.3 GPU schematic - NV3:G80 


PCI/AGP/PCIE bus %---------- + %-------- + 
-------------------| PMC+PBUS |--4 | VRAM | 
Ж---------- + 4-------- + 
| | 
| | 
| | 
4----------- E 4----- + +=-=----- + +=-=-=--=----- + 
PTIMER+PPMI | | РЕВ | | PROM | | PSTRAPS | 
4----------- t 4----- + 4------4 Ж---------Ж 
| | 
SYSRAM | %---------- + 
access bus | | VRAM 
| Я4------- + | access bus 


(continues on next page) 
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%----- | PFIFO |-----4 
%------- + 
[d 
| +---+ 
| | %-------------4 
duit * qee—nee * | | video input 
| PCOUNTER | %----| РОВАРН |-----4 %-------------4 
%---------- + %-------- + | | 
һ-------- + 
%-------- 1 %----- + 4---- PMEDIA | 
| therm | | %-------- + 
| sensor | qoc F | 
%-------- + %------ | РУРЕ |-----4 %-------------- + 
Ж------ + | MPEG decoder | 
%-------------- + 
%-------- + %------- %----------4 
| PVIDEO |---%---| PCRTC |---| I2C+GPIO | 
%-------- + Ж------4 + 4---------- + 
| | 
ы ---%-------%------- + 
| 
%-----4 %--------- %-----------------З 
| РТУ | РВАМПАС | | PCLOCK+PCONTROL | 
%-----4 %--------- + Шартсан шшш шш шы + 
| | 
%-------- --------------Ф 
| ТУ out video output | 
%--------% 4-------------- + 


The GPU is made of: 
* control circuitry: 
- PMC: master control area 
- PBUS: bus control and an area where “misc” registers are thrown in. Known to contain at least: 
ж HWSQ, a simple script engine, can poke card registers and sleep in a given sequence [NV17+] 
ж а thermal sensor [NV30+] 
* clock gating control [NV17+] 
* indirect VRAM access from host circuitry [NV30+] 
ж ROM timings control 
ж PWM controller for fans and panel backlight [NV17+] 
- PPMI: PCI Memory Interface, handles SYSRAM accesses from other units of the GPU 
- PTIMER: measures wall time and delivers alarm interrupts 
- PCLOCK+PCONTROL: clock generation and distribution [contained in PRAMDAC on pre-NV40 GPUs] 
РЕВ: memory controller and arbiter 


PROM: VBIOS ROM access 


- PSTRAPS: configuration straps access 
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* processing engines: 


— PFIFO: gathers processing commands from the command buffers prepared by the host and delivers them 
to PGRAPH and PVPE engines in orderly manner 


— PGRAPH: memory copying, 2d and 3d rendering engine 


— PVPE: a trio of video decoding/encoding engines 


ж PMPEG: МРЕСІ and MPEG2 mocomp and ТОСТ decoding engine [NV17+] 


ж PME: motion estimation engine [NV40+] 
ж PVPI: УР1 video processor [NV41+] 


— PCOUNTER: performance monitoring counters for the processing engines and memory controller 


* display engines: 


- РСКТС: generates display control signals and reads framebuffer data for display, present in two instances 
on МУ11- cards; also handles GPIO and I2C 


— PVIDEO: reads and preprocesses overlay video data 


— PRAMDAC: multiplexes PCRTC, PVIDEO and cursor image data, applies palette LUT, coverts to output 
signals, present in two instances on NV11+ cards; on pre-NV40 cards also deals with clock generation 


— PTV: an on-chip TV encoder 


* misc engines: 


— PMEDIA: controls video capture input and the mediaport, acts as a DMA controller for them 


Almost all units of the GPU are controlled through MMIO registers accessible by a common bus and visible through 
PCI BARO [see PCI BARs and other means of accessing the GPU]. This bus is not shown above. 


2.1.4 GPU schematic - G80:GF100 


PCIE bus +----------4 + t--|--- d————-—— + 
кыекы ш | PMC+PBUS |----| РЕВ |---| ҮКАМ | 
Ж----------4 +--|--+ 0 Ж------ + 
| | | 
%-------- + %------ + | тетогу 
| РТНЕВМ | | partition 
Ж-------- + | +---- ---+ 
| --| РСКАРН | 
Ж-------- + ----|---4 
| PDAEMON |-- | SS ш шл Ea 
Ж--------- + | 
_—————— + _—_—_—_——————+ 
pec == | РЕТЕО: |----% PCOUNTER | 
| ENVIO | | З------- + Ш See t 
Ф5-5-56 | 
| 1111 *-------- + | -------- t 
| --| PCOPY | PFUSE 
Ж---------- + | а------- + | +——————- + 
| PDISPLAY |- 
Ж---------- + | +———————- + |  +-------- + 
| --| РУСОМР |--- PKFUSE | 
Ж------- + --------к | Ж-------- + 


(continues on next page) 
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| PCODEC | | | 

4--------— + | o-v----------------------- E 
*t--| video decoding, crypt | 

------- + | p-—----------------------4 

| PMEDIA |--- 

-------- + 


The GPU is made of: 
* control circuitry: 

- PMC: master control area 

- PBUS: bus control and an area where “misc” registers are thrown in. Known to contain at least: 
ж HWSQ, a simple script engine, can poke card registers and sleep in a given sequence 
ж Clock gating control 
* indirect VRAM access from host circuitry 

— PTIMER: measures wall time and delivers alarm interrupts 


PCLOCK+PCONTROL: clock generation and distribution 


- PTHERM: thermal sensor and clock throttling circuitry 
- PDAEMON: card management microcontroller 
— РЕВ: memory controller and arbiter 

» processing engines: 


- PFIFO: gathers processing commands from the command buffers prepared by the host and delivers them 
to PGRAPH and PVPE engines in orderly manner 


- PGRAPH: memory copying, 2d and 3d rendering engine 
- video decoding engines, see below 
- PCOPY: asynchronous copy engine 
- PVCOMP: video compositing engine 
- PCOUNTER: performance monitoring counters for the processing engines and memory controller 
* display and IO port units: 
- PNVIO: deals with misc external devices 
ж GPIOs 
* fan PWM controllers 
* [2C bus controllers 
videolink controls 


* ROM interface 


* 


* straps interface 

ж PNVIO/PDISPLAY clock generation 
— PDISPLAY: a unified display engine 
- РСОПЕС: audio codec for HDMI audio 
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* misc engines: 


— PMEDIA: controls video capture input and the mediaport, acts as a DMA controller for them 


2.1.5 GPU schematic - GF100- 


Todo: finish file 


2.2 GPU chips 


Contents 


* GPU chips 
- Introduction 
- The СРО families 
NVI family: NVI 
NV3 (RIVA) family: NV3, NV3T 
NV4 (TNT) family: NV4, NV5 
Celsius family: NVIO, NV15, МУТА, МУ11, NVI7, NVIE МУ18 
Kelvin family: NV20, NV2A, NV25, NV28 
Rankine family: NV30, NV35, NV31, NV36, NV34 


* 


* 


* 


* 


* 


* 


* 


Curie family 


х 


Tesla family 


* 


— Comparison table 


Fermi/Kepler/Maxwell/Pascal/Volta/Turing family 


2.2.1 Introduction 


Each nvidia GPU has several identifying numbers that can be used to determine supported features, the engines it 
contains, and the register set. The most important of these numbers is an 8-bit number known as the *GPU id". If two 
cards have the same GPU id, their GPUs support identical features, engines, and registers, with very minor exceptions. 
Such cards can however still differ in the external devices they contain: output connectors, encoders, capture chips, 
temperature sensors, fan controllers, installed memory, supported clocks, etc. You can get the GPU id of a card by 
reading from its PMC area. 


The GPU id is usually written as NVxx, where xx is the id written as uppercase hexadecimal number. Note that, while 
cards before NV10 used another format for their ID register and don't have the GPU id stored directly, they are usually 
considered as NV 1-NV5 anyway. 


Nvidia uses “GPU code names" in their materials. They started out identical to the GPU id, but diverged midway 
through the МУ40 series and started using a different numbering. However, for the most part nvidia code names 
correspond | to 1 with the GPU ids. 
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The GPU id has a mostly one-to-many relationship with pci device ids. Note that the last few bits [0-6 depending on 
GPU] of PCI device id are changeable through straps [see pstraps]. When pci ids of a GPU are listed in this file, the 
following shorthands are used: 


1234 PCI device id 0x1234 

1234* PCI device ids 0x1234-0x1237, choosable by straps 
123X PCI device ids 0x1230-0x123X, choosable by straps 
124Х+ PCI device ids 0x1240-0x125X, choosable by straps 
124X* PCI device ids 0x1240-0x127X, choosable by straps 


2.2.2 The GPU families 


The GPUs can roughly be grouped into a dozen or so families: NV1, NV3/RIVA, NV4/TNT, Celsius, Kelvin, Rankine, 
Curie, Tesla, Fermi, Kepler, Maxwell, Pascal, Volta and Turing. 


This aligns with big revisions of PGRAPH, the drawing engine of the card. While most functionality was introduced 
in sync with PGRAPH revisions, some other functionality [notably video decoding hardware] gets added in GPUs 
late in a GPU family and sometimes doesn't even get to the first GPU in the next GPU family. For example, NV11 
expanded upon the previous NV15 chipset by adding dual-head support, while NV20 added new PGRAPH revision 
with shaders, but didn't have dual-head - the first GPU to feature both was NV25. 


Also note that a bigger GPU id doesn't always mean a newer card / card with more features: there were quite a few 
places where the numbering actually went backwards. For example, NV11 came out later than NV15 and added 
several features. 


Nvidia's card release cycle always has the most powerful high-end GPU first, subsequently filling in the lower-end 
positions with new cut-down GPUs. This means that newer cards in a single sub-family get progressively smaller, 
but also more featureful - the first GPUs to introduce minor changes like DX10.1 support or new video decoding are 
usually the low-end ones. 


Whenever a range of GPUs is mentioned in the documentation, it's written as “NVxx:NVyy”. This is left-inclusive, 
right-noninclusive range of GPU ids as sorted in the following list. For example, G200:GT218 means GPUs G200, 
MCP77, MCP79, GT215, GT216. NV20:NV30 effectively means all NV20 family GPUs. 


The full known GPU list, sorted roughly according to introduced features, is: 
NV1 family: NV1 

NV3 (aka RIVA) family: NV3, NV3T 

NV4 (aka TNT) family: NV4, NV5 

Celsius family: NV10, ХУ15, NVIA, NV11, NV17, МУІЕ, ХУ18 
Kelvin family: NV20, NV2A, NV25, NV28 

Rankine family: NV30, NV35, NV31, NV36, NV34 


Curie family: 
- NV40 subfamily: NV40, NV45, NV41, NV42, NV43, NV44, NV44A 
- G70 subfamily: G70, G71, G73, G72 
— the IGPs: C51, MCP61, MCP67, MCP68, MCP73 
— the special snowflake: RSX 


Tesla family: 
- G80 subfamily: G80 
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— G84 subfamily: G84, G86, G92, G94, G96, G98 
— G200 subfamily: G200, MCP77, MCP79 
— GT215 subfamily: GT215, GT216, GT218, MCP89 
Fermi family: 
- СЕ100 subfamily: GF100, GF104, GF106, GF114, GF116, GF108, GF110 
- GF119 subfamily: GF119, GF117 
Kepler family: GK104, GK107, GK106, GK110, GK110B, GK208, GK208B, GK20A, GK210 
Maxwell family: GM107, GM108, GM204, GM200, GM206, GM20B 
Pascal family: GP100, GP102, GP104, GP106, GP107, GP108 
Volta family: GV100 
Turing family: TU102, TU104, TU106, TU116, TU117 


NV1 family: NV1 


gpu- 


gen NV1 

The first generation of nVidia GPUs. Includes only one GPU - the NV1. It has semi-legendary status, as it's 
very rare and hard to get. The GPU is also known by its SGS-Thomson code number, STG-2000. The most 
popular card using this GPU is Diamond EDGE 3D. 


This GPU is unusual for multiple reasons: 


It has a builtin sound mixer with a MIDI synthetizer (aka PAUDIO). It is supposed to be paired with an 
audio codec (AD1848) for full integrated soundcard functionality. 


It is not Шу VGA-compatible - there is some УСА emulation, but it's quite rough and many features are 
not supported. 


It has no integrated DAC or clock generators — it has to be paired with an accompanying external DAC, 
the STG-1732 or STG-1764 that will convert raw framebuffer contents to display pixels. It is also charged 
with generating the clocks for the GPU. 


The accompanying DAC chip also contains game port functionality, for a complete soundcard replacement. 


As if the game port was not enough, the DAC also supports two Sega Saturn controller ports. 


The so-called 3D engine renders textured quadratic surfaces, instead of triangles (as opposed to all later 
GPUs). Rendering triangles with it is pretty much impossible. 


The GPU was jointly manufactured by SGS-Thomson and nVidia, and uses SGS' PCI vendor ID (there are 
apparently variants using nVidia's vendor id, but not much is known about these). 


There's also NV2, which has even more legendary status. It was supposed to be another card based on quadratic 
surfaces, but it got stuck in development hell and never got released. Apparently it never got to the stage of 
functioning silicon. The device id of NV2 was supposed to be 0x0010. 


NV3 (RIVA) family: NV3, NV3T 


gpu-gen NV3 


The first [moderately] sane GPUs from nvidia, and also the first to use AGP bus. There are two chips in this 
family, and confusingly both use GPU id NV3, but can be told apart by revision. The original NV3 is used in 
RIVA 128 cards, while the revised МУЗ, known as NV3T, is used in RIVA 128 ZX. МУЗ supports АСР 1x anda 
maximum of 4MB of VRAM, while NV3T supports AGP 2x and 8MB of VRAM. NV3T also increased number 
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of slots in PFIFO cache. These GPUs were also manufactured by SGS-Thomson and bear the code name of 
STG-3000. 


The NV3 GPU is made of the following functional blocks: 


host interface, connected to the host machine via PCI or AGP 
two PLLs, to generate video pixel clock and memory clock 


memory interface, connected to 2MB-8MB of external VRAM via 64-bit or 128-bit memory bus, shared 
with an 8-bit parallel flash ROM 


PFIFO, controlling command submission to PGRAPH and gathering commands through DMA to host 
memory or direct MMIO submission 


PGRAPH, the 2d/3d drawing engine, supporting windows GDI and Direct3D 5 acceleration 


VGA-compatible CRTC, RAMDAC, and associated video output circuitry, enabling direct connection of 
VGA analog displays and TV connection via an external AD722 encoder chip 


i2c bus to handle DDC and control mediaport devices 
double-buffered video overlay and cursor circuitry in RAMDAC 


mediaport, a proprietary interface with ITU656 compatibility mode, allowing connection of external video 
capture or MPEG2 decoding chip 


NV3 introduced RAMIN, an area of memory at the end of VRAM used to hold various control structures for 
PFIFO and PGRAPH. On NV3, RAMIN can be accessed in BARI at addresses starting from 0xc00000, while 
later cards have it in BARO. It also introduced РМА objects, а RAMIN structure used to define а УКАМ 
or host memory area that PGRAPH is allowed to use when executing commands on behalf of an application. 
These early DMA objects are limitted to linear VRAM and paged host memory objects, and have to be switched 
manually by host. See МУЗ DMA objects for details. 


МУ4 (TNT) family: МУ4, NV5 


gpu-gen МУ4 
Improved and somewhat redesigned NV3. Notable changes: 


AGP x4 support 

redesigned and improved DMA command submission 

separated core and memory clocks 

DMA objects made more orthogonal, and switched automatically by card 
redesigned PGRAPH objects, introducing the concept of object class in hardware 
added BIOS ROM shadow in RAMIN 

Direct3D 6 / multitexturing support in РОКАРН 

bumped max supported VRAM to 16MB 

[NV5] bumped max supported VRAM to 32MB 

[NV5] PGRAPH 2d context object binding in hardware 


This family includes the original NV4, used in RIVA TNT cards, and NV5 used in RIVA TNT2 and Vanta cards. 
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Celsius family: NV10, NV15, NV1A, NV11, NV17, NV1F, NV18 
gpu-gen Celsius 
The notable changes in this generation are: 
* NVIO: 
— redesigned memory controller 


max УКАМ bumped to 128МВ 


- redesigned УКАМ tiling, with support for multiple tiled regions 


greatly expanded 3d engine: hardware T&L, D3D7, and other features 
— GPIO pins introduced for 222 


— PFIFO: added ВЕЕ СМТ and NONINC commands 


added PCOUNTER: the performance monitoring engine 


new and improved video overlay engine 

— redesigned mediaport 
e NVI5: 

— introduced vblank wait PGRAPH commands 

— minor 3d engine additions [logic operation, ... | 
* NVIA: 

— big endian mode 

— PFIFO: semaphores and subroutines 
* NVII: 

— dual head support, meant for laptops with flat panel + external display 
* NVIT: 

— builtin TV encoder 

- ZCULL 

— added VPE: MPEG2 decoding engine 
e NVI8: 

— AGP x8 support 


— second straps set 


Todo: what were the GPIOs for? 
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pciid | GPU | pixel pipelines and | texture date notes 
ROPs units 

01005) NV10) 4 4 11.10.1999 the first GeForce card [GeForce 256 | 

01505) NV15) 4 8 26.04.2000 the high-end card of GeForce 2 lineup 
[GeForce 2 Ti, ... | 

01405) МУ1А| 2 4 04.06.2001 the ТОР of GeForce 2 lineup [nForce] 

0110*| NV11) 2 4 28.06.2000 the low-end card of GeForce 2 lineup 
[GeForce 2 MX] 

017Х | NV17| 2 4 06.02.2002 the low-end card of GeForce 4 lineup 
[GeForce 4 MX] 

O1fX | NVIF| 2 4 01.10.2002 the IGP of GeForce 4 lineup [nForce 2] 

018Х | NV18| 2 4 25.09.2002 like NV17, but with added AGP x8 support 


МУТА and NVIF are IGPs and lack VRAM, memory controller, mediaport, and ROM interface. They use the internal 
interfaces of the northbridge to access an area of system memory set aside as fake VRAM and BIOS image. 


Kelvin family: NV20, NV2A, NV25, NV28 


gpu-gen Kelvin 
The first cards of this family were actually developed before NV17, so they miss out on several features intro- 
duced in NV17. The first card to merge NV20 and NV17 additions is NV25. Notable changes: 


» NV20: 


no dual head support again 


no PTV, VPE 
no ZCULL 


anew memory controller with Z compression 


RAMIN reversal unit bumped to 0x40 bytes 


3d engine extensions: 


* programmable vertex shader support 


* D3DS, shader model 1.1 


PGRAPH automatic context switching 


* NV25: 
- amerge of NV17 and NV20: has dual-head, ZCULL, ... 


still no VPE and PTV 


* NV28: 


The GPUs are: 


AGP x8 support 
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pciid | GPU | vertex pixel pipelines | texture date notes 
shaders and ROPs units 

0200 NV20| 1 4 8 27.02.2001the only GPU of GeForce 3 lineup 
[GeForce 3 Ti, ... ] 

02409 NV2A 2 4 8 15.11.2001the XBOX IGP [XGPU] 

025Х| NV25 2 4 8 06.02.2002the high-end GPU of GeForce 4 lineup 
[GeForce 4 Ti] 

028Х| NV28| 2 4 8 20.01.2003like NV25, but with added AGP x8 
support 


NV2A is a GPU designed exclusively for the original xbox, and can’t be found anywhere else. Like МУТА and NVIF, 
it's an IGP. 


Todo: verify all sorts of stuff on NV2A 


Rankine family: NV30, NV35, NV31, NV36, NV34 


gpu-gen Rankine 
The infamous GeForce FX series. Notable changes: 


* NV30: 


— 2-stage PLLs introduced [still located in PRAMDAC] 


max VRAM size bumped to 256MB 


3d engine extensions: 


* programmable fragment shader support 


ж D3DO, shader model 2.0 


— return of VPE and PTV 


- new-style memory timings 


* NV35: 


added PEEPHOLE indirect memory access 


- 3d engine now supports depth bounds check 


e МУЗІ: 


- по NV35 changes, this GPU is derived from NV30 


- 2-stage PLLs split into two registers 


— VPE engine extended to work as a PFIFO engine 


* NV36: 


— amerge of NV31 and NV35 changes from NV30 


e NV34: 


- acomeback of NV10 memory controller! 


- NV10-style mem timings again 


— no Z compression again 


2.2. GPU chips 
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— RAMIN reversal unit back at 16 bytes 


— 3d engine additions: 


ж 22? 


Todo: figure out NV34 3d engine changes 


The GPUs are: 
pciid | GPU | vertex pixel pipelines апа | date notes 
shaders ROPs 

030Х | NV30 | 2 8 27.01.2003 | high-end GPU [GeForce FX 5800] 

033X | NV35 | 3 8 12.05.2003! very high-end GPU [GeForce FX 
59X0] 

031X | NV31 | 1 4 06.03.2003 | low-end GPU [GeForce FX 5600] 

034X | NV36 | 3 4 23.10.2003 | middle-end GPU [GeForce FX 5700] 

032X | NV34 | 1 4 06.03.2003 | low-end GPU [GeForce FX 5200] 


The pci vendor id is Ox10de. 


Curie family 


gpu-gen Curie 
This family was the first to feature PCIE cards, and many fundamental areas got significant changes, which later 
paved the way for G80. It is also the family where GPU ids started to diverge from nvidia code names. The 
changes: 


* NV40: 
— RAMIN bumped in size to max 16MB, many structure layout changes 
— RAMIN reversal unit bumped to 512kB 
- 34 engine: support for shader model З and other additions 
— Z compression came back 
— PGRAPH context switching microcode 
— redesigned clock setup 
— separate clock for shaders 
- rearranged PCOUNTER to handle up to 8 clock domains 
— PFIFO cache bumped in size and moved location 
— added independent PRMVIO for two heads 
— second set of straps added, new strap override registers 
— new PPCI PCI config space access window 
— MPEG2 encoding capability added to VPE 
— FIFO engines now identify the channels by their context addresses, not chids 
— BIOS uses all-new BIT structure to describe the card 


— individually disablable shader and ROP units. 
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— added PCONTROL area to... control... stuff? 
- memory controller uses NV30-style timings again 
e NVAI: 
— introduced context switching to VPE 
— introduced PVP1, microcoded video processor 
— first natively PCIE card 
— added PCIE GART to memory controller 
* NV43: 
— added a thermal sensor to the GPU 
» МУ44: 
- a new PCIE GART page table format 
- 34 епеше: 22? 
» NV44A: 
- like NV44, but AGP instead of PCIE 


Todo: more changes 


Todo: figure out 3d engine changes 


The GPUs are: 
pciid GPU id | GPU names | vertex pixel ROPs date notes 
shaders shaders 
004X 0х40/0х45 NVAO/NVAS/NV 48 16 16 14.04.2004 AGP 
021X 
00cX Ox41/0x42 NV41/NV42 5 12 12 08.11.2004 
014X Ox43 NV43 3 8 4 12.08.2004 
016Х 0х44 NV44 3 4 2 15.12.2004TURBOCACHE 
022Х 0х4а МУ44А 3 4 2 04.04.2005 АСР 
009X Ox47 G70 8 24 16 22.06.2005 
014Х 0х46 072 3 4 2 18.01.2006ТОКВОСАСНЕ 
029Х 0х49 071 8 24 16 09.03.2006 
039X Ox4b G73 8 12 8 09.03.2006 
024X Ox4e C51 1 2 1 20.10.20051ОР, ТОКВОСАСНЕ 
03dX Ox4c MCP61 1 2 1 22.06.2006 IGP, TURBOCACHE 
053X 0x67 MCP67 1 2 2 01.02.20061ОР, ТОКВОСАСНЕ 
053Х 0х68 MCP68 1 2 2 22.07.2007 IGP, TURBOCACHE 
07eX 0x63 MCP73 1 2 2. 27.07.2007 IGP, TURBOCACHE 
- 0х44 RSX ? ? ? 11.11.2006FlexIO bus interface, 
used in PS3 


Todo: all geometry information unverified 
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Todo: any information on the RSX? 


It's not clear how NV40 is different from NV45, or NV41 from NV42, or MCP67 from MCP68 - they even share pciid 
ranges. 


The NV4x IGPs actually have a memory controller as opposed to earlier ones. This controller still accesses only host 
memory, though. 


As execution units can be disabled on NV40+ cards, these configs are just the maximum configs - a card can have just 
a subset of them enabled. 


Tesla family 


gpu-gen Tesla 
The card where they redesigned everything. The most significant change was the redesigned memory subsystem, 
complete with a paging MMU [see Tesla virtual memory]. 


* G80: 
- anew VM subsystem, complete with redesigned DMA objects 


- RAMIN is gone, all structures can be placed arbitrarily in УКАМ, and usually host memory memory 
as well 


— all-new channel structure storing page tables, КАМЕС, RAMHT, context pointers, and DMA objects 
— PFIFO redesigned, PIO mode dropped 


- PGRAPH redesigned: based on unified shader architecture, now supports running standalone compu- 
tations, D3D10 support, unified 2d acceleration object 


— display subsystem reinvented from scratch: a stub version of the old VGA-based one remains for VGA 
compatibility, the new one is not VGA based and is controlled by PFIFO-like DMA push buffers 


— memory partitions tied directly to ROPs 
* G84: 
— redesigned channel structure with a new layout 


- got rid of VP1 video decoding and VPE encoding support, but VPE decoder still exists 


added VP2 xtensa-based programmable video decoding and BSP engines 


removed restrictions on host memory access by rendering: rendering to host memory and using block- 
linear textures from host are now ok 


— added VM stats write support to PCOUNTER 
PEEPHOLE moved out of PBUS 

— PFIFO BAR FLUSH moved out of PFIFO 
* G98: 


- introduced VP3 video decoding engines, and the falcon microcode with them 
— got rid of VP2 video decoding 
G200: 


— developped in parallel with G98 
— VP2 again, no VP3 
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— PGRAPH rearranged to make room for more MPs/TPCs 
— streamout enhancements [ARB. transform, feedback2] 
— CUDA ISA 1.3: 64-bit g[] atomics, s[] atomics, voting, fp64 support 
e MCP77: 
- merged G200 and G98 changes: has both VP3 and new РОКАРН 
- only CUDA ISA 1.2 now: fp64 support got cut out again 
GT215: 


— a new revision of the falcon ISA 
— arevision to VP3 video decoding, known as VP4. Adds MPEG-4 ASP support. 
— added PDAEMON, a falcon engine meant to do card monitoring and power maanagement 


PGRAPH additions for D3D10.1 support 


— added HDA audio codec for HDMI sound support, on a separate PCI function 
— Added PCOPY, the dedicated copy engine 
— Merged PSEC functionality into PVLD 
e MCP89: 
— added РУСОМР, the video compositor engine 
The GPUs in this family are: 


core hda | id name TPCs | MPs/TPC | PARTs | date notes 
pciid pciid 

019X - 0x50 | G80 8 2 6 08.11.2006 

040X - 0x84 | G84 2 2 2 17.04.2007 

042X - 0x86 | G86 1 2 2 17.04.2007 
060Х-- | - 0x92 | 692 8 2 4 29.10.2007 
062Х- | - 0x94 | G94 4 2 4 29.07.2008 
064X+ | - 0x96 | G96 2 2 2 29.07.2008 
06eX4 | - 0x98 | G98 1 1 1 04.12.2007 
O5eX+ | - 0ха0 | 0200 10 3 8 16.06.2008 
084X+ | - Охаа | MCP77/MCP78 1 1 1 22.06.2008 | ТОР 
O86X+ | - Oxac | MCP79/MCP7A | 1 2 1 22.06.2008 | ТОР 
OcaX+ | 0064 | Oxa3 | GT215 4 3 2 15.06.2009 

042Х- | 0бе2 | 0ха5 | GT216 2 3 2 15.06.2009 

бабХ- | 0063 | Oxa8 | GT218 1 2 1 15.06.2009 

O8aX+ | - Oxaf | MCP89 2 3 2 01.04.2010 | ТОР 


Like МУ40, these are just the maximal numbers. 


Todo: geometry information not verified for G94, MCP77 


Fermi/Kepler/Maxwell/Pascal/Volta/Turing family 


gpu-gen Fermi 
The card where they redesigned everything again. 
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* GF100: 
— redesigned PFIFO, now with up to 3 subfifos running in parallel 
— redesigned PGRAPH: 
ж split into a central HUB managing everything and several GPCs doing all actual work 
ж GPCs further split into a common part and several TPCs 
* using falcon for context switching 
ж D3D11 support 
- redesigned memory controller 
* Split into three parts: 
» per-partition low-level memory controllers [PBFB] 
» per-partition middle memory controllers: compression, ECC, ... [PMFB] 
- а single “hub” memory controller: VM control, TLB control, ... [PFFB] 


— memory partitions, GPCs, ТРСв have independent register areas, as well as “broadcast” areas that can 
be used to control all units at once 


— second PCOPY engine 


— redesigned PCOUNTER, now having multiple more or less independent subunits to monitor various 
parts of GPU 


— redesigned clock setting 
e GF119: 
— a major revision to VP3 video decoding, now called УР5. vuc microcode removed. 
— another revision to the falcon ISA, allowing 24-bit PC 
— added РОХКІСЗ falcon engine 


redesigned I2C bus interface 
redesigned PDISPLAY 


— removed second PCOPY engine 
* ОЕ117: 
— PGRAPH changes: 
ж 22? 


gpu-gen Kepler 
An upgrade to Fermi. 


* GK104: 


— redesigned PCOPY: the falcon controller is now gone, replaced with hardware control logic, partially 
in PFIFO 


an additional PCOPY engine 


— PFIFO redesign - a channel can now only access a single engine selected on setup, with 
PCOPY2+PGRAPH considered as one engine 


PGRAPH changes: 
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* GK110: 


* 


* 


* 


* 


subchannel to object assignments are now fixed 


m2mf is gone and replaced by a new p2mf object that only does simple upload, other m2mf 
functions are now PCOPY's responsibility instead 


the ISA requires explicit scheduling information now 


lots of setup has been moved from methods/registers into memory structures 


22? 


- PFIFO changes: 


— PGRAPH changes: 


* 


22? 


* ISA format change 


* 


22? 


Todo: figure out PGRAPH/PFIFO changes 


gpu-gen 
gpu-gen 
gpu-gen 
gpu-gen 


Maxwell 


Pascal 


Volta 


Turing 


GPUs in Fermi/Kepler/Maxwell/Pascal/Volta/Turing families: 


core hda | usb | id name GPCs | TPCs | PARTs | MCs | ZCULLs | PCOPYs | HEADs | UNK7 | P 
pciid pciid | pciid /GPC /GPC K 
06сХ- | Obe5 |- Охс0 GF100 4 4 6 [6] [4] [2] [2] - - 
0е2Х- | Obeb | - Охс4 | ОЕ104 |2 4 4 [4] [4] [2] [2] 3 = 
120Х+ | 0e0c | - Охсе | ОЕ114 |2 4 4 [4] [4] [2] [2] 5 = 
OdcX+ | 0569 | - Oxc3 | ОЕ106 | 1 4 3 [3] [4] [2] [2] E = 
124Х+ | Obee | - Oxcf | СЕП6 [1 4 3 [3] [4] [2] [2] z - 
OdeX+ | Obea | - Oxcl | GF108 1 2 1 2 4 [2] [2] я Е 
108X+ | 0е09 | - Oxc8 GF110 4 4 6 [6] [4] [2] [2] - - 
104X* | 0е08 |- 0х49 GF119 1 1 1 1 4 1 2 - - 
1140 - - 0х47 GF117 1 2; 1 1 4 1 -[4] - 1 
118X* | Оеда | - Oxe4 GK104 4 2 4 4 4 3 4 - 1 
OfcX* | Oelb | - Oxe7 GK107 1 2 2 2 4 3 4 - 1 
11сХ+ | 0e0b | - Oxe6 GK106 3 2 3 3 4 3 4 - 1 
100X+ | Oela | - 0хі0 СК110 5 3 6 6 4 3 4 - 2 
100Х- | Oela | - Oxf1 ОК110В | 5 3 6 6 4 3 4 - 2 
2299 2299 |- 2299 GK210 ? ? ? ? ? 2 ? - ? 
128X+ | OeOf | - 0x108 | GK208 1 2 1 1 4 3 4 - 1 
128Х-- | OeOf |- 0х106 | GK208B | 1 2 1 1 4 3 4 - l 
- - - Oxea GK20A 1 1 1 1 4 3 -[4] - 1 
138Х- | Ofbc | - Ox117 | ОМ107 1 5 2 2 4 3 4 1 2 
134Х+ |???» |- Ox118 | ОМ108 1 3 1 1 4 3 4 0 ? 
13cX+ | ОБЬ | - 0х124 | GM204 ? ? 7 ? ? ? ? ? ? 
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Table 1 — continued from previous page 


core hda | usb | id name GPCs | TPCs | PARTs | MCs | ZCULLs | PCOPYs | HEADs | UNK7 | Р 
pciid pciid | pciid /GPC /GPC K 
17cX+ | OfbO | - 0х120 | GM200 ? ? ? ? ? ? ? ? ? 
140X+ | Ofba | - 0x126 | GM206 ? ? ? ? ? ? ? ? ? 
- - - Ox12b | GM20B ? ? ? ? ? 2 2 2 ? 
158X4 | 22272 | - 0х130 | GP100 ? ? ? ? ? ? ? ? ? 
1bOX# | 10ef | - 0х132 | GP102 ? ? ? ? ? ? ? ? ? 
1b8X# | 10f0 |- 0х134 | GP104 4 5 4 4 4 4 4 2 ? 
1cOX# | 10f1 | - 0x136 | GP106 ? ? ? ? ? ? ? ? ? 
lc8X# | 009 | - 0x137 | GP107 ? ? ? ? ? ? ? ? ? 
1а0Хғ | Ofb8 | - 0х138 | GP108 ? ? ? ? ? ? ? ? ? 
10e5* - - Ox13b | GPIOB ? ? ? ? ? ? ? ? ? 
148Х# | 10f2 | - 0х140 | GV100 6 7 2 2 2 2 2 2 ? 
- - - 0х150 | GV11B ? ? ? ? ? ? ? ? ? 
leOX# | 10f7 | lad6 | 0x162 | ТО102 6 6 7 ? ? ? ? ? ? 
le8X# | 1048 | lad8 | 0x164 | ТО104 6 4 ? ? ? ? ? ? ? 
1fOX# | 1009 | lada | 0x166 | ТО106 3 6 ? ? ? ? ? ? ? 
218Х# | laeb | - Ox168 | TU116 3 4 ? ? ? ? ? ? ? 
1f8X# - 0х167 | TU117 2 4 ? ? 2 2 2 ? ? 

Todo: it is said that one of the GPCs [Oth one] has only one TPC on GK106 

Todo: what the fuck is GK110B? and GK208B? 

Todo: GK210 

Todo: GK20A 

Todo: GM20x, GP10x 

Todo: another design counter available on GM107, another 4 on GP10x 

Todo: TU117 one of the GPCs has only three TPCs (so 7 in total, not 8) 

2.2.3 Comparison table 
Name GPU id | GPU generation | Release date [approximate] | Bus interface | PCI vendor id | PCI device IDs | | 
NVI - NVI 09.1995 Pci 0х104а 0х0008-0х0009 | - 

26 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Name GPU id | GPU generation | Release date [approximate] | Bus interface | PCI vendor id | PCI device IDs 
NV3 - NV3 04.1997 Pci 0х 1242 0х0018-0х0019 
NV3T - NV3 23.02.1998 Pci 0х1242 0х0018-0х0019 
NV4 - NV4 23.03.1998 Pci Ox10de 0x0020 

NV5 - NV4 15.03.1999 Pci Ox10de 0x0028-0x002b 
NV6 - NV4 15.03.1999 Pci Ox 10de 0х002с-0х002Ғ 
МУА - NV4 08.09.1999 Pci Ox 104е 0х00а0 

МУ10 0х010 Celsius 11.10.1999 Pci Ox10de 0x0100-0x0103 
МУ15 0х015 Celsius 26.04.2000 Pci Ox10de 0x0150-0x0153 
МУТА 0х01а Celsius 04.06.2001 Pci Ox10de 0х0140-0х01а3 
МУ11 0х011 Celsius 28.06.2000 Pci Ox10de 0x0110-0x0113 
МУ17 0х017 Celsius 06.02.2002 Pci Ox10de 0x0170-0x017f 
NVIF 0х01Ғ Celsius 01.10.2002 Pci Ox10de Ox01f0-0x01ff 
МУ18 0х018 Celsius 25.09.2002 Pci Ox10de 0х0180-0х018Ғ 
МУ20 0х020 Кеіуіп 27.02.2001 Рсі 0х 104е 0х0200-0х0203 
МУ2А 0х02а Kelvin 15.11.2001 Pci Ox10de 0х0240-0х02а3 
NV25 0x025 Kelvin 06.02.2002 Pci Ox10de 0x0250-0x025f 
NV28 0x028 Kelvin 20.01.2003 Pci Ox10de 0x0280-0x028f 
NV30 0x030 Rankine 27.01.2003 Pci Ox10de 0x0300-0x030f 
NV35 0x035 Rankine 12.05.2003 Pci Ox10de 0x0330-0x033f 
NV31 0х031 Rankine 06.03.2003 Pci Ox10de 0x0310-0x031f 
NV36 0x036 Rankine 23.10.2003 Pci Ox10de 0x0340-0x034f 
NV34 0x034 Rankine 06.03.2003 Pci Ox10de 0x0320-0x032f 
МУ40 0х040 Curie 14.04.2004 Pci Ox10de 0x0040-0x004f 
МУ45 0х045 Curie 14.04.2004 Pci Ox10de 0x0040-0x004f 
NV41 0х041 Curie 08.11.2004 Pcie Ox10de 0х00с0-0х00сҒ 
МУ42 0х042 Curie 08.11.2004 Pcie Ox10de 0х00с0-0х00сҒ 
NV43 0x043 Curie 12.08.2004 Pcie Ox10de 0x0140-0x014f 
NV44 0х044 Curie 15.12.2004 Pcie Ox10de 0x0160-0x016f 
NV44A 0х04а Curie 04.04.2005 Pci Ox10de 0x0220-0x022f 
G70 0x047 Curie 22.06.2005 Pcie Ox10de 0x0090-0x009f 
G72 0x046 Curie 18.01.2006 Pcie Ox10de 0x01d0-0x01df 
G71 0х049 Curie 09.03.2006 Pcie Ox10de 0x0290-0x029f 
073 0x04b Curie 09.03.2006 Pcie Ох 104е 0x0390-0x039f 
C51 0х04е Curie 20.10.2005 Igp Ox10de 0x0240-0x024f 
MCP61 0х04с Curie 06.2006 Igp Ox10de 0x03d0-0x03df 
MCP67 0х067 Curie 01.02.2006 Igp Ox10de 0x0530-0x053f 
MCP68 0x068 Curie 07.2007 Igp 0х 104е 0x0530-0x053f 
MCP73 0x063 Curie 07.2007 Igp Ox10de 0x07e0-0x07ef 
RSX 0х044 Curie 11.11.2006 FlexIO - - 

G80 0x050 Tesla 08.11.2006 Pcie Ox10de 0x0190-0x019f 
G84 0x084 Tesla 17.04.2007 Pcie Ox10de 0x0400-0x040f 
G86 0x086 Tesla 17.04.2007 Pcie Ox10de 0x0420-0x042f 
G92 0x092 Tesla 29.10.2007 Pcie Ox10de 0x0600-0x061f 
G94 0x094 Tesla 29.07.2008 Pcie Ox10de 0x0620-0x063f 
G96 0x096 Tesla 29.07.2008 Pcie Ox10de 0x0640-0x065f 
G98 0x098 Tesla 04.12.2007 Pcie Ox10de 0x06e0-0x06ff 
G200 0x0a0 Tesla 16.06.2008 Pcie Ox10de 0х05е0-0х05# 
МСР77 OxOaa Tesla 06.2008 Igp Ox 10de 0x0840-0x085f 
MCP79 OxOac Tesla 06.2008 Igp Ox10de 0x0860-0x087f 
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Name GPU id | GPU generation | Release date [approximate] | Bus interface | PCI vendor id | PCI device IDs | ! 
GT215 Ох0аз Tesla 15.06.2009 Pcie Ox10de Ox0ca0-OxOcbf ( 
GT216 0х0а5 Tesla 15.06.2009 Pcie Ox10de 0х0а20-0х0азғ ( 
01218 0x0a8 Tesla 15.06.2009 Pcie Ox10de 0x0a60-0x0a7f ( 
MCP89 OxOaf Tesla 01.04.2010 Igp Ox10de 0x08a0-0x08bf - 
GF100 0х0с0 Fermi 26.03.2010 Pcie Ox10de O0x06c0-0xO6df | ( 
GF104 0х0с4 Fermi 12.07.2010 Pcie Ox10de O0x0e20-0x0e3f ( 
GF114 0хОсе Fermi 25.01.2011 Pcie Ox10de 0х1200-0х121Ғ | ( 
GF106 0х0с3 Fermi 03.09.2010 Pcie Ox10de OxOdc0-OxOddf | ( 
GF116 OxOcf Fermi 15.03.2011 Pcie Ox10de 0х1240-0х125Ғ | ( 
GF108 0х0с1 Fermi 03.09.2010 Pcie Ox10de 0х04е0-0хО4йн ( 
GF110 Ox0c8 Fermi 07.12.2010 Pcie Ox10de 0x1080-0x109f | ( 
GF119 0х049 Fermi 05.01.2011 Pcie Ox10de 0х1040-0х107Ғ | ( 
СЕ117 0х047 Fermi 04.2012 Pcie Ox10de 0x1140-Ox117f | - 
GK104 Ox0e4 Kepler 22.03.2012 Pcie Ox10de Ox1180-Oxllbf | ( 
GK107 0х0е7 Керїег 24.04.2012 Рсїе Ox10de OxOfcO-OxOfff ( 
GK106 Ox0e6 Kepler 22.04.2012 Pcie Ox 10de Ox 11c0-0x1 1 ff ( 
GK110 0х0Ғ0 Керїег 21.02.2013 Рсїе 0х 104е 0x1000-0x103f | ( 
GK110B | 0х0Ї1 Керїег 07.11.2013 Рсїе 0х 104е 0х1000-0х103Ғ | ( 
GK210 ? Kepler ? Рсїе 0х 104е ? ‘ 
GK208 0х108 Керїег 19.02.2013 Рсїе 0х 104е 0x1280-Ox12bf | ( 
GK208B | 0x106 Kepler ? Pcie Ox10de 0x1280-Ox12bf | ( 
GK20A | Ox0ea Kepler ? Tegra - - - 
GM107 Ox117 Maxwell 18.02.2014 Pcie Ox10de 0x1380-Ox13bf | ( 
ОМ108 0x118 Maxwell ? Pcie 0x10de 0x1340-0x137f |” 
GM204 0x124 Maxwell ? Pcie Ox10de 0х 13с0-0х 13ff ( 
GM200 0х120 Maxwell ? Pcie Ox10de Ox 17c0-0x17ff ( 
GM206 0x126 Maxwell ? Рсїе 0х 104е 0х1400-0х143Ғ | ( 
GM20B | 0x12b Maxwell ? Tegra - - - 
GP100 0x130 Pascal ? Pcie Ox10de 0x 1580-0x15ff ‘ 
GP102 0х132 Pascal ? Pcie Ox10de Ox1b00-Ox1b7f | ( 
GP104 0х134 Pascal ? Pcie Ox10de Ox1b80-Ox1bff ( 
GP106 0x136 Pascal ? Pcie Ox10de Ox 1с00-0х 1с7Г ( 
ОР107 0х137 Pascal 10.25.2016 Pcie Ox10de Ox 1c80-0x I cff ( 
GP108 0x138 Pascal ? Pcie Ox10de Ox1d00-Ox1d7f | ( 
GP10B 0х 135 Pascal 14.03.2017 Tegra Ox10de 0x10e5-0x1164 | - 
GV100 0х140 Volta 12.07.2017 Pcie 0х 104е Ox 1480-0х 147 ( 
СУ11В 0х 155 Volta 03.06.2018 Tegra - - - 
TU102 0х 162 Turing 27.09.2018 Pcie Ox10de 0х 1е00-0х Іе7# ( 
TU104 0х 164 Turing 20.09.2018 Pcie Ox 10de Ox 1е80-0х 1 eff ( 
TU106 0x166 Turing 17.10.2018 Pcie Ox10de Ox 1 f00-0x 1871 ( 
TU116 0х 168 Turing 22.02.2019 Pcie Ox10de 0x2180-0x21ff ( 
TU117 0х 167 Turing 23.04.2019 Pcie Ox10de Ox 1f80-Ox Lfff ( 
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- Introduction 
- GPUs 
* NV5 
* NVIO 
ж NVIS 
* NVII 
* NV20 
* NVI7 
ж NVIS 
* NVIF (GPU) 
ж NV25 
ж NV28 
* NV30 
* NV3I 
* NV34 
ж NV35 
* NV36 
* МҮ40 
ж NV41/NV42 
ж NV43 
* NV44 
* NV44A 
ж C51 GPU 
* G70 
* G72 
* С71 
* G73 
MCP61 GPU 
MCP67 GPU 
MCP73 GPU 
ж G80 
ж G84 
ж G86 
ж G92 
ж G94 


ж 


ж 


ж 
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* G96 

ж G98 

* G200 
MCP77 GPU 
* MCP79 GPU 
* GT215 
GT216 

* GT218 
МСР89 GPU 
ж GFI00 

ж GF104 

ж СЕ114 

ж GFI06 

ж GFIIÓ 

* GFI08 

ж GFIIO 

ж СЕІ19 

ж СЕІ17 

ж СК104 

ж СК106 
СК107 
GKIIO/GK110B 
GK208 
GM107 
GM108 
GM204 
GM206 
СР100 
СР102 
СР104 
СР106 
GP107 
GP108 
GV100 
Т0102 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 


ж 
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ж 


ж 


ж 


ж 


Т0104 
Т0106 
Т0116 
TUII7 


GPU HDA codecs 


GPU USB controllers 


ВКО2 


ВКОЗ 


ВКО4 


* 


* 


* 


* 


— Motherboard chipsets 


NVIA [nForce 220 IGP / 420 IGP / 415 SPP] 
NV2A [XGPU] 

MCP 

NVIF [nForce2 IGP/SPP] 

MCP2 

MCP2A 

СК8 

CK8S 

CK804 

С19 
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2.3.1 Introduction 


nVidia uses PCI vendor id of 0x10de, which covers almost all of their products. Other ids used for nVidia products 
include 0x104a (SGS-Thompson) and 0х1242 (SGS-Thompson/nVidia joint venture). The PCI device ids with 
vendor id 0x104a related to nVidia are: 


device id | product 
0х0008 | NVI main function, DRAM version (SGS-Thompson branding) 
0x0009 | NVI УСА function, DRAM version (SGS-Thompson branding) 


The PCI device ids with vendor id 0x12d2 are: 


device id | product 
0x0018 NV3 [RIVA 128] 
0x0019 | NV3T [RIVA 128 ZX] 


АП other nVidia PCI devices use vendor id 0x10de. This includes: 
* GPUs 
* motherboard chipsets 
* BR03 and NF200 PCIE switches 
* the BRO2 transparent AGP/PCIE bridge 
* GVI, the SDI input card 
The PCI device ids with vendor id 0x10de are: 


device id product 

0x0008 NV1 main function, VRAM version (nVidia branding) 
0x0009 NV1 VGA function, VRAM version (nVidia branding) 
0x0020 NV4 [RIVA TNT] 

0x0028-0x002f | NV5 

0х0030-0х003Ғ | МСРО4 

0x0040-0x004f | NV40 

0х0050-0х005Ғ | СК504 

0x0060-0x006e | MCP2 

0х006Ғ-0х007Ғ | C19 

0х0080-0х008Ғ | МСР2А 

0х0090-0х009Ғ | G70 

0х00а0 NVA [Aladdin TNT2] 

0х00Ю0 МУІ8 Firewire 

0x00b4 C19 

0x00c0-0x00cf | NV4I/NV42 
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Table 3 — continued from previous page 


device id product 
0х0040-0х0042 | СК8 
0х0043 СК804 
0x00d4-0x00dd | CK8 
0х00ағ-0х00еғ | CK8S 
0x00f0-0x00ff | ВКО2 
0x0100-0x0103 | NVIO 
0x0110-0x0113 | NVII 
0x0140-0x014f | NV43 
0x0150-0x0153 | NVI5 
0x0160-0x016f | NV44 
0х0170-0х017Ғ | NVI7 
0x0180-0x018f | МУ/8 
0x0190-0x019f | 080 
0х01а0-0х01аҒ | МУТА 
0х0150-0х0152 | MCP 
0x01b3 BRO3 
0x01b4 МСР 
0х01р7 NVIA, NV2A 
0x015b8-0xO1cf | MCP 
0x01d0-0x0ldf | G72 
0x01e0-0x01f0 | NVIF 
0х01Ғ0-0х01ҒҒ | NVIF GPU 
0x0200-0x0203 | NV20 
0x0210-0x021f | NV40? 
0x0220-0x022f | NV44A 
0x0240-0x024f | C51 GPU 
0x0250-0x025f | NV25 
0x0260-0x0272 | МСР51 
0х027е-0х027Ғ | С51 
0x0280-0x028f | NV28 
0x0290-0x029f | G7I 
0x02a0-0x02af | NV2A 
0x02e0-0x02ef | BR02 
0х02Ғ0-0х02ҒҒ | С51 
0x0300-0x030f | NV30 
0x0310-0x031f | NV3I 
0x0320-0x032f | NV34 
0x0330-0x033f | NV35 
0x0340-0x034f | NV36 
0x0360-0x037f | MCP55 
0x0390-0x039f | G73 
0x03a0-0x03bc | C55 
0x03d0-0x03df | MCP61 GPU 
0х03е0-0х03Ғ7 | MCP6I 
0x0400-0x040f | G84 
0x0410-0x041f | G92 extra IDs 
0x0420-0x042f | 086 
0x0440-0x045f | MCP65 
0x0530-0x053f | MCP67 GPU 


Continued on next page 
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Table 3 — continued from previous page 


device id product 
0x0540-0x0563 | MCP67 
0x0568-0x0569 | MCP77 
0x056a-0x056f | MCP73 
0х0570-0х057Ғ | MCP* ethernet alt ID 
0x0580-0x058f | MCP* SATA alt ID 
0x0590-0x059f | MCP* HDA alt ID 
0x05a0-0x05af | MCP* IDE alt ID 
0x05b0-0x05bf | ВКО4 
0x05e0-0x05ff | G200 
0x0600-0x061f | G92 
0x0620-0x063f | G94 
0x0640-0x065f | G96 
0x06c0-0x06df | GFIO0O 
0x06e0-0x06ff | GIS 
0х0750-0х077Ғ | MCP77 
0x07c0-0x07df | MCP73 
0x07e0-0x07ef | MCP73 GPU 
0х07Ғ0-0х07Ғе | MCP73 
0x0800-0x081a | C73 
0x0840-0x085f | MCP77 GPU 
0x0860-0x087f | MCP79 GPU 
0x08a0-0x08bf | МСР89 СРО 
0x0a20-0x0a3f | GT216 
0x0a60-0x0a7f | GT216 
0x0a80-0x0ac8 | MCP79 
0xOad0-0x0adb | MCP77 
0x0be0-0xO0bef | GPU HDA 
0х00Ғ0-0х0рҒ1 | 720 
0х0са0-0х0срЁ | 67215 
0х0460-0х0494 | МСР89 
0х04с0-0х044Ё | GFIO06 
0xOde0-0xOdff | GFIO0S 
0x0e00 GVI SDI input 
0x0e08-0x0e0f | GPU HDA 
0x0e12-0x0e13 | 7124 
Ox0ela-Ox0elb | GPU HDA 
0х0е1с-0х0е14 | 730 
0x0e20-0x0e3f | GFI04 
0х0Ғ00-0х0Ғ1Ғ | GF108 extra IDs 
OxOfae-0OxOfaf | 7210 
Ox0fb0-Ox0fbf | GPU HDA 
OxOfcO-OxOfff | GKI07 
0x1000-0x103f | СК/10/СК110В 
0x1040-0x107f | СЕ/19 
0x1080-0x109f | СЕ/10 
0х10с0-0х10ағ | 67216 extra IDs 
0х10е5-0х10е6 | 7186 
0х10еҒ-0х10Ғ9 | GPU HDA 
0x1140-0x117f | СЕ117 
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Table 3 — continued from previous page 
device id product 
0x1180-0x11bf | GKI04 
0х11с0-0х11ҒҒ | СК106 
0х1200-0х121Ғ | СЕ114 
0x1240-0x125f | СЕ116 
0x1280-0x12bf | GK208 
0х1340-0х137Ғ | СМ108 
0x1380-0x13bf | СМ107 
0x13c0-0x13ff | GM204 
0х1400-0х143Ғ | GM206 
0x1580-0x15ff | СР100 


0x1617-0x161a | GM204 extra IDs 
0x1667 GM204 extra ID 
Oxlad0-Oxladf | GPU USB 
0x15b00-0x1b7f | GPI02 
Ox1lb80-Oxlbff | СР104 
0x1c00-0x1b7f | СР106 
0х1с80-0х1сЕЁ | СР107 
0х1400-0х147Ё | GP108 
0x1d80-0xidff | СУ100 
0х1е00-0х1е7Ё | TUIO2 
Oxle80-Oxleff | TUIO4 
Ox1f00-0x1f7f | TUIOÓ 
0x2180-0x21ff | TUIIÓ 
Ox1f80-Oxi1fff | TUII7 


2.3.2 GPUs 
NV5 
device id | product 
0x0028 NV5 [RIVA TNT2] 
0x0029 NV5 [RIVA TNT2 Ultra] 
0x002c | NV5 [Vanta] 
0x002d | NV5 [RIVA TNT2 Model 64] 
NV10 


device id | product 

0x0100 | NVIO [GeForce 256 SDR] 
0х0101 | NV10 [GeForce 256 DDR] 
0x0102 NV10 [GeForce 256 Ultra] 
0x0103 NV10 [Quadro] 
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NV15 


device id | product 


0x0150 


NV15 [GeForce2 GTS/Pro] 


0x0151 


NV15 [GeForce2 Ti] 


0x0152 


0x0153 


[ 
NV15 [GeForce2 Ultra] 
NV15 [Quadro2 Pro] 


NV11 


device id 


product 


0x0110 


NV11 [GeForce2 MX/MX 400] 


0x0111 


NV11 [GeForce2 MX 100/200] 


0x0112 


0x0113 


[ 
NV11 [GeForce2 Go] 
NV11 [Quadro2 MXR/EX/Go] 


NV20 


device id | product 


0x0200 NV20 [GeForce3] 


0x0201 | NV20 [GeForce3 Ti 200] 


[ 
0x0202 | NV20 [GeForce3 Ti 500] 
0x0203 | NV20 [Quadro DCC] 


NV17 


device id 


product 


0x0170 


NV17 [GeForce4 MX 460] 


0x0171 


NV17 [GeForce4 MX 440] 


0x0172 


NV17 [GeForce4 MX 420] 


0x0173 


NV17 [GeForce4 MX 440-SE] 


0x0174 


МУ17 [GeForce4 440 Go] 


0x0175 


МУ17 [GeForce4 420 Go] 


0x0176 


NV17 [GeForce4 420 Go 32M] 


0x0177 


0x0178 


NV17 [Quadro4 550 XGL] 


0x0179 


NV17 [GeForce4 440 Go 64M] 


0x017a 


NV17 [Quadro NVS 100/200/400] 


0x017b 


NV17 [Quadro4 550 XGL]??? 


0x017c 


NV17 [Quadro4 500 GoGL] 


0х0174 


[ 
[ 
[ 
[ 
[ 
[ 
МУ17 [GeForce4 460 Go] 
[ 
[ 
[ 
[ 
[ 
[ 


МУ17 [GeForce4 410 Go 16M] 
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NV18 


device id | product 

0x0181 | NV18 [GeForce4 MX 440 AGP 8x] 
0x0182 | NVI8 [GeForce4 MX 440-SE AGP 8x] 
0x0183 | NV18 [GeForce4 MX 420 AGP 8x] 
0x0185 | ХУ18 [GeForce4 MX 4000] 

0x0186 | NV18 [GeForce4 448 Go] 

0x0187 NV18 [GeForce4 488 Go] 

0х0188 | МУІ8 [Quadro4 580 XGL] 

0x0189 | МУ18 [GeForce4 MX AGP 8x (Mac)] 
0x018a | NV18 [Quadro NVS 280 SD] 

0x018b | NV18 [Quadro4 380 XGL] 
0 
0 
0 


1 ||| | | 


х018с | NV18 [Quadro NVS 50 PCI] 
x018d | NV18 [GeForce4 448 Go] 
x00b0 NV 18 Firewire controller 


NV1F (GPU) 


device id | product 
0х01Ғ0 МУ1Е GPU [GeForce4 MX IGP] 


NV25 


device id | product 

0x0250 NV25 [GeForce4 Ti 4600] 
0х0251 NV25 [GeForce4 Ti 4400] 
0x0252 NV25 [GeForce4 Ti] 
0x0253 NV25 [GeForce4 Ті 4200] 
0x0258 NV25 [Quadro4 900 XGL] 
0x0259 NV25 [Quadro4 750 XGL ] 
0x025b | NV25 [Quadro4 700 XGL] 


-"шіт-ші-ші-шіт-шіт-ш 


МУ28 


device id | product 

0x0280 NV28 [GeForce4 Ti 4800] 
0x0281 NV28 [GeForce4 Ti 4200 AGP 8x] 
0x0282 NV28 [GeForce4 Ti 4800 SE] 
0x0286 NV28 [GeForce4 Ti 4200 Go] 
0x0288 NV28 [Quadro4 980 XGL] 
0x0289 NV28 [Quadro4 780 XGL] 
0x028c | NV28 [Quadro4 700 GoGL] 


= || -4| гч 
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NV30 


NV31 


NV34 


NV35 


device id | product 


0x0301 


NV30 [GeForce FX 5800 Ultra] 


0x0302 


NV30 [GeForce FX 5800] 


0x0308 


0x0309 


[ 
NV35 [Quadro FX 2000] 
NV35 [Quadro FX 1000] 


device id | product 


0x0311 


NV31 [GeForce FX 5600 Ultra] 


0x0312 


NV31 [GeForce FX 5600] 


0x0314 


NV31 [GeForce FX 5600XT] 


0x031a 


0x031b 


NV31 [GeForce FX Go5650] 


0x031c 


[ 
[ 
NV31 [GeForce FX Go5600] 
[ 
[ 


NV31 [GeForce FX Go700] 


device id 


product 


0x0320 


NV34 [GeForce FX 5200] 


0x0321 


NV34 [GeForce FX 5200 Ultra] 


0x0322 


NV34 [GeForce FX 5200] 


0x0323 


NV34 [GeForce FX 5200LE] 


0x0324 


NV34 [GeForce FX Go5200] 


0x0325 


NV34 [GeForce ЕХ Go5250] 


0x0326 


NV34 [GeForce FX 5500] 


0x0327 


0x0328 


NV34 [GeForce FX Go5200 32M/64M] 


0x0329 


NV34 [GeForce FX Go5200 (Mac)] 


0x032a 


NV34 [Quadro NVS 280 PCI] 


0x032b 


NV34 [Quadro FX 500/ҒХ 600] 


0x032c 


NV34 [GeForce ЕХ Go5300/Go5350] 


0x032d 


[ 
[ 
[ 
[ 
[ 
[ 
NV34 [GeForce FX 5100] 
[ 
[ 
[ 
[ 
[ 
[ 


NV34 [GeForce FX Go5100] 


device id | product 


x0330 


NV35 [GeForce FX 5900 Ultra] 


x0331 


NV35 [GeForce FX 5900] 


x0332 


NV35 [GeForce FX 5900XT] 


NV35 [GeForce FX 5950 Ultra] 


x0334 


NV35 [GeForce ЕХ 5900ZT] 


x0338 


NV35 [Quadro FX 3000] 


0 
0 
0 
0х0333 
0 
0 
0 


x033f 


= | | | | 


NV35 [Quadro FX 700] 
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NV36 
device id | product 
0x0341 | NV36 [GeForce FX 5700 Ultra] 
0x0342 | NV36 [GeForce FX 5700] 
0x0343 | NV36 [GeForce FX 5700LE] 
0x0344 | NV36 [GeForce FX 5700VE] 
0x0347 NV36 [GeForce FX Go5700] 
0x0348 | NV36 [GeForce FX Go5700] 
0x034c | NV36 [Quadro FX Go1000] 
0x034e | NV36 [Quadro FX 1100] 
МУ40 


device id | product 


0х0040 | NV40 [GeForce 6800 Ultra] 
0x0041 МУ40 [GeForce 6800] 
0x0042 | NV40 [GeForce 6800 LE] 
0х0043 | NV40 [GeForce 6800 XE] 
0x0044 | NV40 [GeForce 6800 XT] 
0х0045 | NV40 [GeForce 6800 GT] 
0х0046 | NV40 [GeForce 6800 GT] 
0х0047 | NV40 [GeForce 6800 GS] 
0х0048 | NV40 [GeForce 6800 XT] 
0x004e | NV40 [Quadro FX 4000] 


0x0211 | NV40? [GeForce 6800] 

0х0212 | МУ40? [GeForce 6800 LE] 
0х0215 | NV40? [GeForce 6800 GT] 
0х0218 | NV40? [GeForce 6800 XT] 


Todo: wtf 15 with that Ox21x ID? 


NV41/NV42 


device id | product 

0х00с0 NV41/NV42 [GeForce 6800 GS] 

0x00c1 NV41/NV42 [GeForce 6800] 

0x00c2 NV41/NV42 [GeForce 6800 LE] 

0x00c3 NV41/NV42 [GeForce 6800 XT] 

0x00c8 NV41/NV42 [GeForce Go 6800] 
[ 
[ 
[ 
[ 


0x00c9 | NV41/NV42 [GeForce Go 6800 Ultra] 
0х00сс | NV41/NV42 [Quadro FX Go1400] 
0х00са | NV41/NV42 [Quadro FX 3450/4000 SDI] 
0х00се | NV41/NV42 [Quadro ЕХ 1400] 


2.3. nVidia PCI id database 39 


nVidia Hardware Documentation, Release git 


NV43 
device id | product 
0x0140 | NV43 [GeForce 6600 GT] 
0х0141 | NV43 [GeForce 6600] 
0x0142 | NV43 [GeForce 6600 LE] 
0x0143 | NV43 [GeForce 6600 VE] 
0х0144 | NV43 [GeForce Go 6600] 
0x0145 | NV43 [GeForce 6610 XL] 
0x0146 | NV43 [GeForce Go 6200 TE / 6660 TE] 
0x0147 | NV43 [GeForce 6700 XL] 
0x0148 | NV43 [GeForce Go 6600] 
0x0149 | NV43 [GeForce Go 6600 GT] 
0х014а | NV43 [Quadro NVS 440] 
0х014с | NV43 [Quadro FX 540M] 
0х0144 | NV43 [Quadro FX 550] 
0х014е | NV43 [Quadro FX 540] 
0х014Ғ | NV43 [GeForce 6200] 
NV44 
device id | product 
0x0160 NV44 [GeForce 6500] 
0x0161 NV44 [GeForce 6200 TurboCache] 
0x0162 NV44 [GeForce 6200 SE TurboCache] 
0x0163 | NV44 [GeForce 6200 LE] 
0x0164 NV44 [GeForce Go 6200] 
0x0165 | NV44 [Quadro NVS 285] 
0x0166 NV44 [GeForce Go 6400] 
0x0167 NV44 [GeForce Go 6200] 
0x0168 NV44 [GeForce Go 6400] 
0x0169 NV44 [GeForce 6250] 
0х016а | NV44 [GeForce 7100 GS] 
NV44A 
device id | product 
0х0221 | NV44A [GeForce 6200 (AGP)] 
0х0222 | NV44A [GeForce 6200 A-LE (AGP)] 
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C51 GPU 


device id 


product 


0x0240 


C51 GPU [GeForce 6150] 


0x0241 


C51 GPU [GeForce 6150 LE] 


0x0242 


C51 GPU [GeForce 6100] 


0x0244 


C51 GPU [GeForce Go 6150] 


0x0245 


C51 GPU [Quadro МУ5 2108 / NVIDIA GeForce 6150LE] 


0x0247 


= || | ee | 


C51 GPU [GeForce Go 6100] 


G70 


G72 


device id | product 

0x0090 | G70 [GeForce 7800 GTX] 
0x0091 | G70 [GeForce 7800 GTX] 
0x0092 G70 [GeForce 7800 GT] 
0х0093 | G70 [GeForce 7800 GS] 
0х0095 | G70 [GeForce 7800 SLI] 
0 | 
0 | 
0 | 


х0098 G70 [GeForce Go 7800] 
x0099 | G70 [GeForce Go 7800 GTX] 
x009d | G70 [Quadro FX 4500] 


device id | product 


x01 


dO | G72 [GeForce 7350 LE] 


x01 


d1 G72 [GeForce 7300 LE] 


x01 


d2 G72 [GeForce 7550 LE] 


x01 


аз | G72 [GeForce 7300 SE/7200 GS] 


x01 


d6 | G72 [GeForce Go 7200] 


x01 


а? G72 [Quadro МУ$ 110M / GeForce Go 7300] 


x01 


d8 | G72 [GeForce Go 7400] 


x01 


x01 


da | G72 [Quadro NVS 110M] 


x01 


db | G72 [Quadro NVS 120M] 


x01 


dc | G72 [Quadro FX 350M] 


x01 


dd | G72 [GeForce 7500 LE] 


x01 


de | G72 [Quadro FX 350] 


[жей ж» GOI OI ӘС OS] CO] CO] SS) ОО ОО 


х01 


[ 
[ 
[ 
[ 
[ 
[ 
49 | G72 [GeForce Go 7450] 
[ 
[ 
[ 
[ 
[ 
[ 


df | G72 [GeForce 7300 GS] 
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G71 
device id | product 
0x0290 | G71 [GeForce 7900 GTX] 
0x0291 | G71 [GeForce 7900 GT/GTO] 
0x0292 | 071 [GeForce 7900 GS] 
0x0293 | G71 [GeForce 7900 GX2] 
0х0294 | СТІ [GeForce 7950 GX2] 
0x0295 | G71 [GeForce 7950 GT] 
0x0297 G71 [GeForce Go 7950 GTX] 
0x0298 | G71 [GeForce Go 7900 GS] 
0x0299 G71 [GeForce Go 7900 GTX] 
0x029a | G71 [Quadro FX 2500M] 
0x029b | G71 [Quadro FX 1500M] 
0х029с | G71 [Quadro FX 5500] 
0x029d | G71 [Quadro FX 3500] 
0х029е | G71 [Quadro FX 1500] 
0x029f | G71 [Quadro FX 4500 X2] 
G73 
device id | product 
0x0390 | G73 [GeForce 7650 GS] 
0x0391 | G73 [GeForce 7600 GT] 
0x0392 | G73 [GeForce 7600 GS] 
0x0393 | G73 [GeForce 7300 GT] 
0x0394 | G73 [GeForce 7600 LE] 
0x0395 | G73 [GeForce 7300 GT] 
0x0397 | G73 [GeForce Go 7700] 
0x0398 | G73 [GeForce Go 7600] 
0x0399 | G73 [GeForce Go 7600 GT] 
0x039a | G73 [Quadro NVS 300M] 
0x039b | G73 [GeForce Go 7900 SE] 
0х039с | G73 [Quadro FX 560M] 
0x039e | G73 [Quadro FX 560] 
MCP61 GPU 
device id | product 
0х0340 | MCP61 GPU [GeForce 6150SE nForce 430] 
0х0341 | MCP61 GPU [GeForce 6100 nForce 405] 
0х0342 | MCP61 GPU [GeForce 6100 nForce 400] 
0x03d5 | MCP61 GPU [GeForce 6100 nForce 420] 
0x03d6 | MCP61 GPU [GeForce 7025 / nForce 630a] 
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MCP67 GPU 


device id | product 
0x0531 | MCP67 GPU [GeForce 7150M / nForce 630M] 
0x0533 | MCP67 GPU [GeForce 7000M / nForce 610M] 
0x053a | MCP67 GPU [GeForce 7050 PV / nForce 630a] 
[ 
[ 


0x053b | MCP67 GPU [GeForce 7050 PV / nForce 630a] 
0x053e | MCP67 GPU [GeForce 7025 / nForce 630a] 


Note: mobile is apparently considered to be MCP67, desktop MCP68 


MCP73 GPU 
device id | product 
0x07e0 MCP73 GPU [GeForce 7150 / nForce 6301] 
0x07e1 MCP73 GPU [GeForce 7100 / nForce 6301] 
0x07e2 MCP73 GPU [GeForce 7050 / nForce 6301] 
0x07e3 | MCP73 GPU [GeForce 7050 / nForce 6101] 
0x07e5 | MCP73 GPU [GeForce 7050 / nForce 620i] 
G80 


device id | product 

0x0191 | G80 [GeForce 8800 GTX] 

0x0193 | G80 [GeForce 8800 GTS] 

0x0194 | G80 [GeForce 8800 Ultra] 

0x0197 | G80 [Tesla C870] 
[ 
[ 


0х0194 | G80 [Quadro ЕХ 5600] 
0х019е | G80 [Quadro ЕХ 4600] 
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G84 
device id | product 
0x0400 | 684 [GeForce 8600 GTS] 
0x0401 | G84 [GeForce 8600 GT] 
0x0402 | G84 [GeForce 8600 GT] 
0x0403 | G84 [GeForce 8600 GS] 
0x0404 | G84 [GeForce 8400 GS] 
0x0405 | G84 [GeForce 9500M GS] 
0x0406 | G84 [GeForce 8300 GS] 
0x0407 | 084 [GeForce 8600M GT] 
0x0408 | G84 [GeForce 9650M GS] 
0x0409 | G84 [GeForce 8700M GT] 
0x040a | G84 [Quadro FX 370] 
0x040b | G84 [Quadro NVS 320M] 
0x040c | G84 [Quadro FX 570M] 
0x040d | G84 [Quadro FX 1600M] 
0x040e | G84 [Quadro FX 570] 
0x040f | G84 [Quadro FX 1700] 

G86 
device id | product 
0х0420 | G86 [GeForce 8400 SE] 
0x0421 | G86 [GeForce 8500 GT] 
0x0422 086 [GeForce 8400 GS] 
0x0423 | G86 [GeForce 8300 GS] 
0x0424 G86 [GeForce 8400 GS] 
0x0425 | G86 [GeForce 8600M GS] 
0x0426 | G86 [GeForce 8400M GT] 
0x0427 | О86 [GeForce 8400M GS] 
0x0428 | G86 [GeForce 8400M G] 
0x0429 | G86 [Quadro NVS 140M] 
0x042a | G86 [Quadro NVS 130M] 
0x042b | G86 [Quadro NVS 135M] 
0х042с | G86 [GeForce 9400 GT] 
0х0424 | G86 [Quadro FX 360M] 
0х042е | G86 [GeForce 9300M С] 
0x042f | G86 [Quadro NVS 290] 

G92 


device id | product 
0x0410 | G92 [GeForce GT 330] 
0x0600 | G92 [GeForce 8800 GTS 512] 
0x0601 | G92 [GeForce 9800 GT] 
0x0602 | G92 [GeForce 8800 GT] 
Continued on next page 


44 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Table 4 — continued from previous page 
device id | product 
0x0603 | G92 [GeForce GT 230] 
0x0604 | G92 [GeForce 9800 GX2] 
0x0605 | G92 [GeForce 9800 GT] 
0x0606 | G92 [GeForce 8800 GS] 
0x0607 | G92 [GeForce GTS 240] 
0x0608 | G92 [GeForce 9800M GTX] 
0x0609 | G92 [GeForce 8800M GTS] 
0х060а | G92 [GeForce GTX 280M] 
0x060b | G92 [GeForce 9800M GT] 
0х060с | G92 [GeForce 8800M GTX] 
0x060f | G92 [GeForce GTX 285M] 


[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
G92 [GeForce 9800 GTX/9800 GTX+] 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 


0x0610 | G92 [GeForce 9600 GSO] 
0x0611 | G92 [GeForce 8800 GT] 
0x0612 

0х0613 | G92 [GeForce 9800 GTX+] 
0x0614 | G92 [GeForce 9800 GT] 
0х0615 | G92 [GeForce GTS 250] 
0х0617 | G92 [GeForce 9800M GTX] 
0x0618 | G92 [GeForce GTX 260M] 
0x0619 | G92 [Quadro FX 4700 X2] 
0х061а | G92 [Quadro FX 3700] 
0х061р | G92 [Quadro VX 200] 
0х061с | G92 [Quadro FX 3600M] 
0х0614 | G92 [Quadro FX 2800M] 
0х061е | G92 [Quadro FX 3700M] 
0х061Ғ | G92 [Quadro FX 3800M] 


G94 


device id | product 

0x0621 G94 [GeForce GT 230] 
0x0622 | G94 [GeForce 9600 GT] 
0x0623 | G94 [GeForce 9600 GS] 
0x0625 | G94 [GeForce 9600 GSO 512] 
0x0626 | G94 [GeForce GT 130] 
0x0627 G94 [GeForce GT 140] 
0x0628 | G94 [GeForce 9800M GTS] 
0x062a | G94 [GeForce 9700M GTS] 
0x062b | G94 [GeForce 9800M GS] 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 


0x062c | G94 [GeForce 9800M GTS | 
0x062d | G94 [GeForce 9600 GT] 
0x062e | G94 [GeForce 9600 GT] 
0x0631 | G94 [GeForce GTS 160M] 
0x0635 | G94 [GeForce 9600 GSO] 
0x0637 | G94 [GeForce 9600 GT] 
0x0638 | G94 [Quadro FX 1800] 
0x063a | G94 [Quadro FX 2700M] 
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G96 
device id | product 
0x0640 | G96 [GeForce 9500 GT] 
0x0641 | G96 [GeForce 9400 GT] 
0x0643 | G96 [GeForce 9500 GT] 
0x0644 | G96 [GeForce 9500 GS] 
0x0645 | G96 [GeForce 9500 GS] 
0x0646 | G96 [GeForce GT 120] 
0x0647 | G96 [GeForce 9600М GT] 
0x0648 | G96 [GeForce 9600M GS] 
0x0649 | G96 [GeForce 9600M GT] 
0x064a | G96 [GeForce 9700M GT] 
0x064b | G96 [GeForce 9500M G] 
0х064с | G96 [GeForce 9650M GT] 
0x0651 | G96 [GeForce G 110M] 
0x0652 | G96 [GeForce GT 130M] 
0x0653 | G96 [GeForce GT 120M] 
0x0654 | G96 [GeForce GT 220M] 
0x0655 | G96 [GeForce GT 120] 
0x0656 | G96 [GeForce GT 120 | 
0x0658 | G96 [Quadro FX 380] 
0x0659 | G96 [Quadro FX 580] 
0x065a | G96 [Quadro FX 1700M] 
0x065b | G96 [GeForce 9400 GT] 
0х065с | G96 [Quadro FX 770М| 
0х065Ғ | G96 [GeForce G210] 
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G98 
device id | product 
0x06e0 | G98 [GeForce 9300 GE] 
0х06е1 | G98 [GeForce 9300 GS] 
0x06e2 | G98 [GeForce 8400] 
0x06e3 | G98 [GeForce 8400 SE] 
0x06e4 | G98 [GeForce 8400 GS] 
0x06e6 | G98 [GeForce G100] 
0x06e7 | G98 [GeForce 9300 SE] 
0x06e8 | G98 [GeForce 9200M GS] 
0x06e9 | G98 [GeForce 9300M GS] 
0x06ea | G98 [Quadro NVS 150M] 
0х06ер | G98 [Quadro NVS 160M] 
0xO6ec | G98 [GeForce G 105M] 
0х0беғ | G98 [GeForce G 103M] 
0х06Ғ1 | G98 [GeForce G105M] 
0х06Ғ8 | G98 [Quadro NVS 420] 
0х06Ғ9 | G98 [Quadro FX 370 LP] 
0х06Ға | G98 [Quadro NVS 450] 
Ox06fb | G98 [Quadro FX 370M] 
Ox06fd | G98 [Quadro NVS 295] 
OxO6ff | G98 [HICx16 + Graphics] 

G200 


device id | product 

0х05е0 | G200 [GeForce GTX 295] 

0x05e1 G200 [GeForce GTX 280] 

0x05e2 G200 [GeForce GTX 260] 
] 
] 


0x05e3 G200 [GeForce GTX 285 


[ 
[ 
[ 
0x05e6 G200 [GeForce GTX 275 
0x05e7 | 6200 [Tesla C1060] 
0x05e9 G200 [Quadro CX] 
[ 
[ 
[ 
[ 
[ 


0x05ea | G200 [GeForce GTX 260] 
0x05eb | G200 [GeForce GTX 295] 
0х05е4 | G200 [Quadro FX 5800] 
0x05ee | G200 [Quadro FX 4800] 
Ox05ef | G200 [Quadro FX 3800] 
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MCP77 GPU 
device id | product 
0x0840 | МСР77 GPU [GeForce 8200M] 
0x0844 | MCP77 GPU [GeForce 9100М G] 
0x0845 | MCP77 GPU [GeForce 8200М G] 
0x0846 | МСР77 GPU [GeForce 9200] 
0x0847 | МСР77 GPU [GeForce 9100] 
0x0848 | MCP77 GPU [GeForce 8300] 
0x0849 | МСР77 GPU [GeForce 8200] 
0x084a | MCP77 GPU [nForce 730a] 
0x084b | MCP77 GPU [GeForce 9200] 
0x084c | MCP77 GPU [nForce 980a/780a SLI] 
0x084d | MCP77 GPU [nForce 750a SLI] 
0x084f | MCP77 GPU [GeForce 8100 / nForce 720a] 

MCP79 GPU 
device id | product 
0х0860 | MCP79 GPU [GeForce 9400] 
0х0861 | MCP79 GPU [GeForce 9400] 
0x0862 | MCP79 GPU [GeForce 9400M G] 
0x0863 | MCP79 GPU [GeForce 9400M] 
0х0864 МСР79 GPU [GeForce 9300] 
0x0865 | MCP79 GPU [ION] 
0x0866 | MCP79 GPU [GeForce 9400M G] 
0x0867 | MCP79 GPU [GeForce 9400] 
0x0868 | MCP79 GPU [nForce 7601 SLI] 
0x0869 | MCP79 GPU [GeForce 9400] 
0x086a | MCP79 GPU [GeForce 9400] 
0x086c | MCP79 GPU [GeForce 9300 / nForce 730i] 
0x086d | MCP79 GPU [GeForce 9200] 
0x086e | MCP79 GPU [GeForce 9100M G] 
0x086f | MCP79 GPU [GeForce 8200M G] 
0x0870 | MCP79 GPU [GeForce 9400M] 
0x0871 | MCP79 GPU [GeForce 9200] 
0x0872 | MCP79 GPU [GeForce G102M] 
0x0873 | MCP79 GPU [GeForce G102M] 
0x0874 | MCP79 GPU [ION] 
0x0876 | MCP79 GPU [ION] 
0x087a | MCP79 GPU [GeForce 9400] 
0x087d | MCP79 GPU [ION] 
0x087e | MCP79 GPU [ION LE] 
0x087f | MCP79 GPU [ION LE] 
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GT215 
device id | product 
ОхОса0 | GT215 [GeForce GT 330] 
0x0ca2 GT215 [GeForce GT 320] 
0х0са3 | GT215 [GeForce GT 240] 
0x0ca4 GT215 [GeForce GT 340] 
ОхОса5 | GT215 [GeForce GT 220] 
0x0ca7 GT215 [GeForce GT 330] 
0х0са9 | GT215 [GeForce GTS 250M] 
ОхОсас | GT215 [GeForce GT 220] 
OxOcaf | GT215 [GeForce GT 335M] 
OxOcbO | GT215 [GeForce GTS 350M] 
0x0cb1 GT215 [GeForce GTS 360M] 
OxOcbc | GT215 [Quadro FX 1800M] 

GT216 


device id | product 
0x0a20 GT216 [GeForce GT 220] 
0x0a22 GT216 [GeForce 315] 
0x0a23 GT216 [GeForce 210] 
0x0a26 GT216 [GeForce 405] 
0x0a27 GT216 [GeForce 405] 
0x0a28 GT216 [GeForce GT 230M 
0x0a29 GT216 [GeForce GT 330M 
0x0a2a GT216 [GeForce GT 230M 

[ 

[ 

[ 

[ 

[ 

[ 

[ 

[ 


] 
] 
] 
0x0a2b | СТ216 [GeForce GT 330М| 
0x0a2c | СТ216 [NVS 5100M] 
0x0a2d | GT216 [GeForce GT 320M] 
0x0a32 GT216 [GeForce GT 415] 
0x0a34 GT216 [GeForce GT 240M] 
0x0a35 | GT216 [GeForce GT 325М| 
0x0a38 | GT216 [Quadro 400] 
0x0a3c | GT216 [Quadro FX 880M] 
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GT218 

device id | product 
0x0a60 | GT218 [GeForce G210] 
0x0a62 GT218 [GeForce 205] 
0x0a63 | GT218 [GeForce 310] 
0x0a64 | GT218 [ION] 
0x0a65 | GT218 [GeForce 210] 
0x0a66 | GT218 [GeForce 310] 
0x0a67 GT218 [GeForce 315] 
0x0a68 | GT218 [GeForce G105M] 
0x0a69 | GT218 [GeForce G105M] 
0x0a6a | GT218 [NVS 2100M] 
Ох0абс | GT218 [NVS 3100M] 
0x0a6e | GT218 [GeForce 305M] 
ОхбавҒ | GT218 [ION] 
0x0a70 GT218 [GeForce 310M] 
0x0a71 | GT218 [GeForce 305М| 
0x0a72 | GT218 [GeForce 310М| 
0x0a73 | GT218 [GeForce 305M] 
0x0a74 | GT218 [GeForce G210M] 
0x0a75 GT218 [GeForce 310M] 
0x0a76 | GT218 [ION] 
0x0a78 | GT218 [Quadro FX 380 LP] 
0x0a7a | GT218 [GeForce 315M] 
0x0a7c | GT218 [Quadro FX 380M] 
0х10с0 | GT218 [GeForce 9300 GS] 
0x10c3 GT218 [GeForce 8400GS] 
0x10c5 GT218 [GeForce 405] 
0х1048 | GT218 [NVS 300] 

MCP89 GPU 

device id | product 

0х08а0 | MCP89 GPU [GeForce 320M] 
0х08а2 | MCP89 GPU [GeForce 320M] 
0x08a3 | MCP89 GPU [GeForce 320M] 
0х08а4 | MCP89 GPU [GeForce 320M] 
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GF100 
device id | product 
0х06с0 | GF100 [GeForce GTX 480] 
0х06с4 | GF100 [GeForce GTX 465] 
0х06са | GF100 [GeForce GTX 480М| 
0xO6cb | GF100 [GeForce GTX 480] 
0х06с4 | GF100 [GeForce GTX 470] 
0х0641 | GF100 [Tesla C2050 / C2070] 
0х0642 | ОЕ100 [Tesla M2070] 
0x06d8 GF100 [Quadro 6000] 
0x06d9 GF100 [Quadro 5000] 
0х064а | GF100 [Quadro 5000M] 
0х064с | GF100 [Quadro 6000] 
0х06аа | GF100 [Quadro 4000] 
0х064е | GF100 [Tesla T20 Processor] 
0х0бағ | ОЕ100 [Tesla M2070-Q] 
GF104 
device id | product 
0x0e22 | ОЕ104 [GeForce GTX 460] 
0x0e23 | ОЕ104 [GeForce GTX 460 SE] 
0x0e24 | ОЕ104 [GeForce GTX 460 OEM] 
0x0e30 | GF104 [GeForce GTX 470M] 
0x0e31 | ОЕ104 [GeForce GTX 485M] 
0x0e3a | GF104 [Quadro 3000M] 
0x0e3b | GF104 [Quadro 4000M] 
GF114 
device id | product 
0x1200 | ОЕ114 [GeForce GTX 560 Ti] 
0x1201 | ОЕ114 [GeForce GTX 560] 
0x1202 | ОЕ114 [GeForce GTX 560 Ti OEM] 
0x1203 | ОЕ114 [GeForce GTX 460 SE v2] 
0x1205 | ОЕ114 [GeForce GTX 460 v2] 
0x1206 | ОЕ114 [GeForce GTX 555] 
0x1207 | ОЕ114 [GeForce GT 645 OEM] 
0x1208 | ОЕ114 [GeForce GTX 560 SE] 
0x1210 | ОЕ114 [GeForce GTX 570M] 
0x1211 | ОЕ114 [GeForce GTX 580M] 
0x1212 | ОЕ114 [GeForce GTX 675M] 
0x1213 | ОЕ114 [GeForce GTX 670M] 
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GF106 


device id | product 
0х04с0 | СЕ106 
0х04с4 | ОЕ106 
0х04с5 | ОЕ106 
0х04с6 | СЕ106 
OxOdcd | СЕ106 
0х04се | GF106 
OxOdd1 | ОЕ106 
0х0аа2 | ОЕ106 
Ox0dd3 | СЕ106 
0х0ааб | СЕ106 
0х04848 | СЕ106 
OxOdda | СЕ106 


GeForce GT 440] 
GeForce GTS 450] 
GeForce GTS 450] 
GeForce GTS 450] 
GeForce GT 555M] 
GeForce GT 555М| 
GeForce GTX 460M] 
GeForce GT 445M] 
GeForce GT 435M] 
GeForce GT 550M] 
Quadro 2000] 
Quadro 2000М| 


|| | |11 


GF116 


device id | product 
x1241 | GF116 [GeForce GT 545 OEM] 
124 GF116 [GeForce GT 545] 
124 GF116 [GeForce GTX 550 Ti] 
124 GF116 [GeForce GTS 450 Rev. 2] 
124 GF116 [GeForce GT 550М| 
124 GF116 [GeForce GT 635M] 
[ 
[ 
[ 
[ 
[ 


124 GF116 [GeForce GT 555M] 

124 GF116 [GeForce GTS 450 Rev. 3] 
124 GF116 [GeForce GT 640 ОЕМ| 
124 GF116 [GeForce GT 555М| 
GF116 [GeForce GTX 560M] 


OQ} TO} WO} CO] | | 01| ноо 


о|оо|оууоуоуоуо|ооОо|о 


MTX) XML MTX LM хх хх 


N 
сл 
= 
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GF108 
device id | product 
0х04е0 | GF108 [GeForce GT 440] 
0х04е1 | GF108 [GeForce GT 430] 
0х04е2 | GF108 [GeForce GT 420] 
0х04е3 | GF108 [GeForce GT 635M] 
0х04е4 | GF108 [GeForce GT 520] 
0х04е5 | GF108 [GeForce GT 530] 
0х04е8 | GF108 [GeForce GT 620M] 
0х04е9 | GF108 [GeForce GT 630M] 
OxOdea | GF108 [GeForce 610M] 
OxOdeb | GF108 [GeForce GT 555M] 
OxOdec | GF108 [GeForce GT 525M] 
OxOded | GF108 [GeForce GT 520M] 
OxOdee | GF108 [GeForce GT 415M] 
OxOdef | GF108 [NVS 5400M] 
OxOdfO | GF108 [GeForce GT 425M] 
OxOdf1 | ОЕ108 [GeForce GT 420M] 
0х0ағ2 | GF108 [GeForce GT 435M] 
0OxOdf3 | GF108 [GeForce GT 420M] 
OxOdf4 | GF108 [GeForce GT 540M] 
0х0ағ5 | GF108 [GeForce GT 525M] 
OxOdf6 | GF108 [GeForce GT 550M] 
OxOdf7 | GF108 [GeForce GT 520M] 
OxOdf8 | GF108 [Quadro 600] 
OxOdf9 | GF108 [Quadro 500M] 
OxOdfa | GF108 [Quadro 1000M] 
OxOdfc | GF108 [NVS 5200M] 
0х0Ғ00 | GF108 [GeForce GT 630] 
0х0Ғ01 | GF108 [GeForce GT 620] 
GF110 
device id | product 
0x1080 | ОЕ110 [GeForce GTX 580] 
0x1081 | GFI110 [GeForce GTX 570] 
0x1082 | ОЕ110 [GeForce GTX 560 Ti] 
0x1084 | ОЕ110 [GeForce GTX 560] 
0x1086 | ОЕ110 [GeForce GTX 570] 
0x1087 | ОЕ110 [GeForce GTX 560 Ti] 
0x1088 | ОЕ110 [GeForce GTX 590] 
0x1089 | ОЕ110 [GeForce GTX 580] 
0x108b | ОЕ110 [GeForce GTX 580] 
0x1091 | GFI10 [Tesla M2090] 
0x109a | ОЕ110 [Quadro 5010M] 
0х1095 | ОЕ110 [Quadro 7000] 
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GF119 
device id | product 
0x1040 | ОЕ119 [GeForce GT 520] 
0x1042 GF119 [GeForce 510] 
0x1048 GF119 [GeForce 605] 
0x1049 | ОЕ119 [GeForce GT 620] 
0x104a | ОЕ119 [GeForce GT 610] 
0x1050 | GF119 [GeForce GT 520M] 
0x1051 | ОЕ119 [GeForce GT 520MX] 
0x1052 | ОЕ119 [GeForce GT 520M] 
0x1054 | ОЕ119 [GeForce 410M] 
0x1055 | GF119 [GeForce 410M] 
0x1056 | GF119 [NVS 4200M] 
0x1057 | GF119 [NVS 4200M] 
0x1058 GF119 [GeForce 610M] 
0x1059 GF119 [GeForce 610M] 
0х105а | СЕ119 [GeForce 610M] 
0x107d | GF119 [NVS 310] 

GF117 
device id | product 
0x1140 | GFII7 [GeForce GT 620M] 

GK104 


device id | product 
0x1180 GK104 [GeForce GTX 680] 
0x1183 GK104 [GeForce GTX 660 Ti] 
0x1185 GK104 [GeForce GTX 660] 
0x1188 GK104 [GeForce GTX 690] 

] 


0x1189 | ОК104 [GeForce GTX 670 


[ 
[ 
[ 
[ 
[ 
0x1199 | ОК104 [GeForce GTX 870M] 
0х119Ғ | ОК104 [GeForce GTX 780M] 
0х11а0 | GK104 [GeForce GTX 680M] 
Oxllal | GK104 [GeForce GTX 670MX 
[ 
[ 
[ 
[ 
[ 
[ 
[ 
[ 


0х11а2 | GK104 [GeForce GTX 675MX 
0х11а3 | GK104 [GeForce GTX 680MX 
0х11а7 | GK104 [GeForce GTX 675MX 
0х11ра | GK104 [Quadro K5000] 
0х11рс | GK104 [Quadro K5000M] 
0х11ра | GK104 [Quadro K4000M] 
0х11ре | GK104 [Quadro K3000M] 
0х115ЬҒ | GK104 [GRID K2] 


— | Кыша Кык 
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GK106 
device id | product 
0х11с0 | ОК106 [GeForce GTX 660] 
0х11с6 | GK106 [GeForce GTX 650 Ti] 
0х11е0 | GK106 [GeForce GTX 770M] 
0х11Ға | ОК106 [Quadro K4000] 
GK107 
device id | product 
0х0Ёс0 GK107 [GeForce GT 640] 
0х0Ёс1 | ОК107 [GeForce GT 640] 
0х0Ёс2 | ОК107 [GeForce GT 630] 
0х0Ёс6 GK107 [GeForce GTX 650] 
OxOfd1 | ОК107 [GeForce GT 650M] 
0х0Ға2 | ОК107 [GeForce GT 640M] 
0хоҒаз | ОК107 [GeForce GT 640M LE] 
OxOfd4 | ОК107 [GeForce GTX 660M] 
0хоҒа5 | ОК107 [GeForce GT 650M] 
0х0Ғав | ОК107 [GeForce GT 640M] 
0х0Ға9 | ОК107 [GeForce GT 645M] 
0х0Ғе0 | ОК107 [GeForce GTX 660M] 
0х0Ғе9 GK107 [GeForce GT 750M Mac Edition] 
OxOff GK107 [Quadro K2000D] 
OxOffa | ОК107 [Quadro K600] 
OxOffb | ОК107 [Quadro K2000M] 
OxOffc | ОК107 [Quadro K1000M] 
OxOffd | GK107 [NVS 510] 
OxOffe | ОК107 [Quadro K2000] 
OxOfff | ОК107 [Quadro 410] 
GK110/GK110B 


device id | product 
0x1003 GK110 [GeForce GTX Titan LE] 
0x1004 GK110 [GeForce GTX 780] 
0x1005 GK110 [GeForce GTX Titan] 
Ox101f | GK110 [Tesla K20] 
0x1020 | GK110 [Tesla K20X] 

[ 

[ 

[ 

[ 


0x1021 | GK110 [Tesla K20Xm] 
0x1022 | GK110 [Tesla K20c] 
0x1026 | GK110 [Tesla K20s] 
0x1028 | GK110 [Tesla K20m] 
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GK208 


device id | product 
x1280 | GK208 [GeForce GT 635] 
1282 GK208 [GeForce GT 640 Rev. 2] 
1284 GK208 [GeForce GT 630 Rev. 2] 
1290 | GK208 [GeForce GT 730M] 
1291 GK208 [GeForce GT 735M 
1292 GK208 [GeForce GT 740M 

[ 

[ 

[ 

[ 

[ 


1294 GK208 [GeForce GT 740М 
1295 GK208 [GeForce 710М| 
1259 | GK208 [Quadro K610M] 
12ba | GK208 [Quadro K510M] 


] 
] 
1293 | GK208 [GeForce GT 730M] 
] 


ососо(соүосо соо со со ojlo 


MIM TL MTX (х | х ххх 


GM107 


device id | product 
0x1381 GM107 [GeForce GTX 750] 
0x1392 GM107 [GeForce GTX 860M] 
0х139а | ОМ107 [GeForce GTX 950M] 
| 
| 


0х1395 | ОМ107 [GeForce GTX 960M] 
0x13b0 | ОМ107 [Quadro M2000M] 


GM108 


device id | product 
0x1340 | GMIOS 
0x1341 | ОМ108 
0x1346 | ОМ108 
0x1347 | ОМ108 
0x134d | GMIOS 


GeForce 840M] 
GeForce 930M] 
GeForce 940M] 
GeForce 940МХ | 


GM204 


device id | product 
0х13с0 ОМ201 [GeForce GTX 980] 
0x13c2 GM204 [GeForce GTX 970] 
0x13d7 GM204 [GeForce GTX 980M] 
[ 
[ 


0x13d8 | GM204 [GeForce GTX 970M] 
0x13d9 | GM204 [GeForce GTX 965M] 
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GM206 
device id | product 
0х1401 GM206 [GeForce GTX 960] 
0x1407 GM206 [GeForce GTX 750 v2] 
0x1427 GM206 [GeForce GTX 965M v2] 
GP100 
device id | product 
0х15#7 ОР100 [Tesla P100 PCIe 12GB] 
0x15f8 GP100 [Tesla P100 PCIe 16GB] 
0x15f9 GP100 [Tesla P100 SXM2 16GB] 
GP102 
device id | product 
0х1600 ОР102 [GeForce TITAN X] 
0x1b02 GP102 [GeForce TITAN Xp] 
0x1b06 GP102 [GeForce GTX 1080 Ti] 
0х1530 ОР102 [Quadro P6000] 
0х1538 GP102 [Tesla P40] 
GP104 


device id | product 
0x1b80 GP104 [GeForce GTX 1080] 
0х1581 GP104 [GeForce GTX 1070] 
0х10582 GP104 [GeForce GTX 1070 Ti] 
0х1083 | GP104 [GeForce GTX 1060 6GB] 
0х1084 GP104 [GeForce GTX 1060 3GB] 
0х1ра0 GP104 [GeForce GTX 1080 Mobile] 
Oxlbal GP104 [GeForce GTX 1070 Mobile] 
0х1ра2 GP104 [GeForce GTX 1070 Mobile] 
[ 
[ 
[ 
[ 
[ 
[ 
[ 


Ox1bbO | GP104 [Quadro P5000] 

0х1003 | GP104 [Tesla P4] 

0х1506 | GP104 [Quadro P5000 Mobile] 
Ox1bb7 | GP104 [Quadro P4000 Mobile] 
0х1008 | GP104 [Quadro P3000 Mobile] 
0х1ре0 | ОР104 [GeForce GTX 1080 Mobile] 
0х1ре1 ОР104 [GeForce GTX 1070 Mobile] 
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GP106 
device id | product 
0х1с02 | ОР106 [GeForce GTX 1060 3GB] 
0x1c03 | GP106 [GeForce GTX 1060 6GB] 
0x1c20 GP106 [GeForce GTX 1060 Mobile] 
0x1c23 | ОР106 [GeForce GTX 1060] 
0x1c60 | GP106 [GeForce GTX 1060 Mobile] 
0x1c61 | GP106 [GeForce GTX 1050 Ti Mobile] 
0x1c62 GP106 [GeForce GTX 1050 Mobile] 
GP107 
device id | product 
0x1c81 | GP107 [GeForce GTX 1050] 
0x1c82 | GP107 [GeForce GTX 1050 Ti] 
0x1c83 | GP107 [GeForce GTX 1050 3GB] 
0x1c8c | GP107 [GeForce GTX 1050 Ti Mobile] 
0x1c8d | GP107 [GeForce GTX 1050 Mobile] 
0x1c8f | GP107 [GeForce GTX 1050 Ti Мах-О| 
0x1c92 GP107 [GeForce GTX 1050 Max-Q] 
GP108 
device id | product 
0х1а01 | ОР108 [GeForce GT 1030] 
0х1410 | GP108 [GeForce MX150] 
0х1412 | GP108 [GeForce MX150] 
GV100 


device id | product 
0х1481 | СУ100 
0х1ар1 | СУ100 
0х1454 | СУ100 
0х1455 | СУ100 
0х1456 | СУ100 
0х14ра | СУ100 


TITAN V] 

Tesla V100 5ХМ2 16GB] 
Tesla V100 PCIe 16GB] 
Tesla V100 SXM2 32GB] 
Tesla V100 PCIe 32GB] 
Quadro GV100] 


нен | Рең [p irm энэн i 
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TU102 
device id | product 
0x1e02 TU102 [TITAN RTX] 
0х1е04 TU102 [GeForce RTX 2080 Ti] 
0x1e07 TU102 [GeForce RTX 2080 Ti] 
0x1e30 TU102 [Quadro RTX 8000] (0x10de 0x129e) 
0x1e30 TU102 [Quadro RTX 6000] 
0х1е3с | ТО102 [Quadro RTX 6000] 
Т0104 
device id | product 
0x1e82 TU104 [GeForce RTX 2080] 
0x1e87 TU104 [GeForce RTX 2080] 
0x1e89 TU104 [GeForce RTX 2060] 
0x1e90 TU104 [GeForce RTX 2080 Mobile] 
Oxlebo TU104 [Quadro RTX 5000] 
Oxlebl TU104 [Quadro RTX 4000] 
0х1е40 TU104 [GeForce КТХ 2080 Mobile] 
TU106 
device id | product 
0х1Ғ02 ТО 106 [GeForce RTX 2070] 
Ox1f07 TU106 [GeForce RTX 2070] 
0х1Ғ08 TU106 [GeForce RTX 2060] 
Ox1f10 TU106 [GeForce RTX 2070 Mobile] 
Ox1f11 TU106 [GeForce RTX 2060 Mobile] 
0х1Ғ50 ТО 106 [GeForce RTX 2070 Mobile] 
Ox1f51 TU106 [GeForce RTX 2060 Mobile] 
TU116 
device id | product 
0x2182 TU116 [GeForce GTX 1660 Ti] 
0х2184 TU116 [GeForce GTX 1660] 
TU117 


device id | product 
0x1f82 TU117 [GeForce GTX 1650] 
0х1Ғ91 TU117 [GeForce GTX 1650 Mobile] 
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2.3.3 GPU HDA codecs 


device id | product 

0x0be2 GT216 HDA 
0х0ре3 | GT218 HDA 
0x0be4 GT215 HDA 
0х0ре5 | GF100 HDA 
0х0ре9 | GF106 HDA 
xObea | ОЕ108 HDA 
xObeb | СЕ104 HDA 
xObee | ОЕ116 HDA 
x0e08 | ОЕ119 HDA 
0х0е09 | ОЕ110 HDA 
О0хбеба | ОК104 HDA 
0x0e0b | GK106 HDA 
0xOeOc | ОЕ114 HDA 
0xOeOf | ОК208 HDA 
OxOela | GK110 HDA 
OxOelb | ОК107 HDA 
OxOfbO | GM200 HDA 
OxOfb8 | ОР108 HDA 
OxOfb9 | ОР107 HDA 
OxOfba | GM206 HDA 
OxOfbb | GM204 HDA 


0 
0 
0 
0 


OxOfbc | GM107 HDA 
Ox10ef | GP102 HDA 
0х10Ғ0 | ОР104 HDA 
0х10#1 | GP106 HDA 
0x10£2 | GV100 HDA 
0х10#7 | TU102 HDA 
0x10£8 | TU104 HDA 
0x10£9 | TU106 HDA 
Oxlaeb | TU116 HDA 
0x???? | ТО117 HDA 


2.3.4 GPU USB controllers 


device id | product 

0х1а46 | ТО102 USB 

Oxlad7 TU102 USB UCSI Controller 
0х1а48 | ТО104 USB 

0xlad9 TU104 USB UCSI Controller 
Oxlada | ТО106 USB 

Oxladb | ТО106 USB UCSI Controller 


2.3.5 ВН02 


Тһе ВКО2 aka Н51 is a transparent PCI-Express - АСР bridge. It сап be used to connect PCIE GPU to АСР bus, or 
the other way around. Its PCI device id shadows the actual GPU's device id. 
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device id | product 

0х00Ғ1 | BRO2+NV43 [GeForce 6600 GT] 
0x00£2 | BRO2+NV43 [GeForce 6600] 

0x00£3 | BRO2+NV43 [GeForce 6200] 

0х00Ғ4 | BRO2+NV43 [GeForce 6600 LE] 
0х00#5 | BRO2+G71 [GeForce 7800 GS] 

0х00#6 | BRO2+NV43 [GeForce 6800 GS/XT] 
0x00£8 | BRO2+NV40 [Quadro FX 3400/4400] 
0x00f9 ВК02--МУ40 [GeForce 6800 Series GPU] 
0x00fa | BRO2+NV36 [GeForce РСХ 5750] 
0x00fb | BRO2+NV35 [GeForce PCX 5900] 
0х00Ғс | BRO2+NV34 [GeForce PCX 5300 / Quadro FX 330] 
0х00ға | BRO2+NV34 [Quadro FX 330] 

0х00Ғе | BRO2+NV35 [Quadro FX 1300] 
0х00ҒҒ | BRO2+NV18 [GeForce PCX 4300] 
0х02е0 | ВКО2-073 [GeForce 7600 GT] 

0x02e1 | ВКО2-073 [GeForce 7600 GS] 

0х02е2 | BRO2+G73 [GeForce 7300 GT] 

0x02e3 | BRO2+G71 [GeForce 7900 GS] 

0х02е4 | BRO2+G71 [GeForce 7950 GT] 


2.3.6 BR03 


The ВКОЗ aka МЕ100 is a PCI-Express switch with 2 downstream 16x ports. It's used on NV40 generation dual-GPU 


cards. 


device id | product 


0х0103 | ВКОЗ [GeForce 7900 GX2/7950 GX2] 


2.3.7 BR04 


The ВКО4 aka NF200 is a PCI-Express switch with 4 downstream 16x ports. It's used on Tesla and Fermi generation 
dual-GPU cards, as well as some SLI-capable motherboards. 


device id 


product 


0x05b1 


ВКО4 [motherboard] 


0x05b8 


ВКО4 [GeForce GTX 295] 


0х0509 


0х05ре 


| 
ВКО4 [GeForce GTX 590] 
ВКО4 [GeForce 9800 GX2/Quadro Plex S4/Tesla 57 | 


2.3.8 Motherboard chipsets 


NV1A [nForce 220 IGP / 420 IGP / 415 SPP] 


The northbridge of nForcel chipset, paired with MCP. 
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device id | product 

0х01а0 | ХУГА GPU [GeForce2 MX IGP] 
0х01а4 | NVIA host bridge 

0x01a5 NV1A host bridge [?] 

0х01а6 | ХУГА host bridge [?] 

0x01a8 NV1A memory controller [?] 
0х01а9 | ХУГА memory controller [?] 
0х01аа | NVIA memory controller #3, 64-bit 
OxOlab | NVIA memory controller #3, 128-bit 
OxOlac | NVIA memory controller #1 
0х01аа | NVIA memory controller #2 
0x01b7 | NVIA/NV2A AGP bridge 


Note: 0x01b7 is also used on NV2A. 


NV2A [XGPU] 


The northbridge of xbox, paired with MCP. 


device id | product 

0х02а0 | NV2A GPU 

0x02a5 | NV2A host bridge 
0x02a6 | NV2A memory controller 
0x01b7 | NVIA/NV2A AGP bridge 


Note: 0x01b7 is also used on NV/A. 


MCP 


The southbridge of nForce! chipset and xbox, paired with NVZA or NV2A. 


device id | product 

0х01р0 МСР APU 

0х01р1 | MCP AC'97 

0х0152 | MCP LPC bridge 
0х01р4 МСР SMBus controller 
0х0108 MCP PCI bridge 
0xO1bc MCP IDE controller 
0х01с1 | MCP МС”97 

0x01c2 MCP USB controller 
0x01c3 MCP ethernet controller 


NV1F [nForce2 IGP/SPP] 


The northbridge of nForce2 chipset, paired with MCP2 or МСРЗА. 


62 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


device id | product 

0х01е0 | NVIF host bridge 

0х01е8 | NVIF АСР bridge 

OxOlea | NVIF memory controller #1 
0х01ер | NVIF memory controller #1 
OxOlec | NVIF memory controller #4 
OxOled | NVIF memory controller #3 
0х01ее | NVIF memory controller #2 
OxOlef | NVIF memory controller #5 


MCP2 


The southbridge of nForce2 chipset, original revision. Paired with NV7F. 


device id | product 

x0060 | MCP2LPC bridge 
x0064 MCP2 SMBus controller 
x0065 MCP2 IDE controller 
x0066 MCP2 ethernet controller 
x0067 MCP2 USB controller 
x0068 | MCP2 USB 2.0 controller 
x0069 | MCP2MC'97 

x006a | MCP2 AC'97 

x006b | MCP2 APU 

х006с | MCP2 PCI bridge 

x006d | MCP2 internal PCI bridge for 3com ethernet 
x006e MCP2 Firewire controller 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 


MCP2A 


The southbridge of nForce2 400 chipset. Paired with NV7F. 


device id | product 

0х0080 | MCP2A LPC bridge 

0x0084 MCP2A SMBus controller 

0x0085 MCP2A IDE controller 

0x0086 | MCP2A ethernet controller (class 0200) 
0x0087 MCP2A USB controller 

0x0088 | MCP2A USB 2.0 controller 

0x0089 | MCP2A MC’97 

0х008а | MCP2A AC’97 
0 
0 


x008b | MCP2A PCI bridge 
х008с | MCP2A ethernet controller (class 0680) 
0x008e | MCP2A SATA controller 


CK8 


The nforce3-150 chipset. 
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device id | product 

0х0040 СКВ LPC bridge 
0х0041 | CKS host bridge 
0х0042 CK8 AGP bridge 


0х0044 СК8 SMBus controller 
0x00d5 СК8 IDE controller 
0x00d6 СКВ ethernet controller 
0х0047 СКВ USB controller 
0x00d8 CK8 USB 2.0 controller 
0x00d9 | CK8MC'97 

0х004а | CK8 AC'97 

0х0044 | CK8 PCI bridge 


CK8S 


The nforce3-250 chipset. 


device id | product 

OxOOdf | CKSS ethernet controller (class 0680) 
0х00е0 | CK8S LPC bridge 

0х00е1 | CKSS host bridge 

0x00e2 | СК85 AGP bridge 

0х00е3 СК85 SATA controller #1 

0х00е4 СК85 SMBus controller 

0x00e5 CK8S IDE controller 

0x00e6 CK8S ethernet controller (class 0200) 
0x00e7 СК85 USB controller 

0x00e8 CK8S USB 2.0 controller 

0x00e9 | CK8S MC'97 

0х00еа | CK8S AC'97 

0х00ес | CKSS ???? (class 0780) 

0х00еа | CK8S PCI bridge 

0х00ее | CK8S SATA controller #0 


CK804 


The AMD nforce4 chipset, standalone or paired with C19 or C51 to make nforce4 SLI x16 chipset. 
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device id | product 

0x0050 | CK804 LPC bridge 
0x0051 | CK804 LPC bridge 
0х0052 CK804 SMBus controller 


0x0053 | CK804 IDE controller 
0x0054 CK804 SATA controller #0 
0х0055 | CK804 SATA controller #1 
0х0056 | CK804 ethernet controller (class 0200) 
0x0057 CK804 ethernet controller (class 0680) 
0x0058 | CK804 MC'97 
0х0059 | CK804 AC’97 
0x005a | CK804 USB controller 
0x005b | CK804 USB 2.0 controller 
0х005с | CK804 PCI subtractive bridge 
0 


5а | CK804 PCI-Express port 
0х005е | CK804 memory controller #0 
0x005f | CK804 memory controller #12 
0х0043 | CK804 memory controller #10 


C19 


The intel nforce4 northbridge, paired with MCP04 or CK804. 


device id | product 

x006f | C19 memory controller #3 
x0070 | C19 host bridge 

x0071 | C19 host bridge 

x0072 C19 host bridge [?] 

x0073 C19 host bridge [?] 

x0074 | C19 memory controller #1 
x0075 | C19 memory controller #2 
x0076 | C19 memory controller #10 
x0078 | C19 memory controller #11 
x0079 | C19 memory controller #12 
х007а | C19 memory controller #13 
x007b | C19 memory controller #14 
x007c | C19 memory controller #15 
x007d | C19 memory controller #16 
x007e | C19 PCI-Express port 
x007f | C19 memory controller #1 
x00b4 | C19 memory controller #4 


CO] со со 0 со olojo o] со o] |o соо OD] OC] ojlo 


МСРО4 


The intel nforce4 southbridge, paired with C19. 
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device id | product 

0х0030 | MCP04 LPC bridge 

0x0034 МСРО4 SMBus controller 

0x0035 МСРО4 IDE controller 

0x0036 | MCP04 SATA controller #0 

0x0037 МСРО4 ethernet controller (class 0200) 
0x0038 MCP04 ethernet controller (class 0680) 
0х0039 | МСРО4 MC’97 

0x003a | МСРО4 AC’97 

0x003b | MCP04 USB controller 

0x003c | MCP04 USB 2.0 controller 

0x003d | МСРО4 PCI subtractive bridge 
0x003e | MCP04 SATA controller #1 

0x003f | МСРО4 memory controller 


C51 


The AMD nforce4xx/nforce5xx northbridge, paired with CK804, MCP51, or MCP55. 


device id | product 

0х02Ғ0 | C51 memory controller #0 
0x02f1 | C51 memory controller #0 
0х02#2 | C51 memory controller #0 
0х02#3 | C51 memory controller #0 
0х02Ғ4 | C51 memory controller #0 
0x02f5 | C51 memory controller #0 
0x02f C51 memory controller #0 
0x02f7 | C51 memory controller #0 
0x02f C51 memory controller #3 
0x02f C51 memory controller #4 
0x02fa | C51 memory controller #1 
0x02fb | C51 PCI-Express x16 port 
0х02Ёс | C51 PCI-Express x1 port #0 
0х02Ға | C51 PCI-Express x1 port #1 
0x02fe | C51 memory controller #2 
0x02ff | C51 memory controller #5 
0x027e | C51 memory controller #7 
0х027Ғ | C51 memory controller #6 


MCP51 


The AMD nforce5xx southbridge, paired with C51 or C55. 
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device id | product 

0х0260 | MCP51 LPC bridge 

0х0261 | MCP51 LPC bridge 

0х0262 | МСР51 LPC bridge [?] 

0х0263 | MCP51 LPC bridge [?] 

0x0264 МСР51 SMBus controller 

0x0265 MCP51 IDE controller 

0x0266 МСР51 SATA controller #0 

0x0267 МСР51 SATA controller #1 

0x0268 MCPS51 ethernet controller (class 0200) 
0x0269 MCPS51 ethernet controller (class 0680) 
0x026a | MCP51 MC'97 

0x026b | МСР51 AC'97 

0x026c | MCP51 HDA 

0x026d | МСР51 USB controller 

0x026e | МСР5І USB 2.0 controller 

0x026f | МСР51 PCI subtractive bridge 
0x0270 | MCP51 memory controller #0 

0х0271 | MCP51 SMU 

0x0272 | MCP51 memory controller #12 


C55 


Paired with MCP51 or MCP55. 
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device id | product 

0x03a0 | C55 host bridge [?] 
0х03а1 | C55 host bridge 
0x03a2 | C55 host bridge 
0x03a3 | C55 host bridge 
0x03a4 C55 host bridge [?] 
0x03a5 | C55 host bridge [?] 
0x03a6 | C55 host bridge [?] 
0x03a7 C55 host bridge [?] 
0x03a8 | C55 memory controller #5 
0x03a9 | C55 memory controller #3 
0x03aa | C55 memory controller #2 
0x03ab | C55 memory controller #4 
0x03ac | C55 memory controller #1 
0x03ad | C55 memory controller #10 
0x03ae | C55 memory controller #11 
0x03af | C55 memory controller #12 
0x03b0 | C55 memory controller #13 
0x03b1 | C55 memory controller #14 
0х03р2 | C55 memory controller #15 
0x03b3 | C55 memory controller #16 
0x03b4 | C55 memory controller #7 
0x03b5 | C55 memory controller #6 
0x03b6 | C55 memory controller #20 
0x03b7 | C55 PCI-Express x16/x8 port 
0х0308 | C55 PCI-Express x8 port 
0x03b9 | C55 PCI-Express x1 port #0 
0x03ba | C55 memory controller #22 
0x03bb | C55 PCI-Express х1 port #1 
0x03bc | C55 memory controller #21 


Todo: shouldn't 0x03b8 support x4 too? 


MCP55 


Standalone or paired with C51, C55 or C73. 


68 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


device id | product 

0x0360 | MCP55 LPC bridge 

0x0361 | MCP55 LPC bridge 

0х0362 | MCP55 LPC bridge 

0x0363 | MCP55 LPC bridge 

0x0364 | MCP55 LPC bridge 

0x0365 | MCP55 LPC bridge [?] 

0x0366 | MCP55 LPC bridge [?] 

0x0367 | MCP55 LPC bridge [?] 

0x0368 | MCP55 SMBus controller 
0x0369 | MCP55 memory controller #0 
0x036a | MCP55 memory controller #12 
0x036b | MCP55 SMU 

0x036c | MCP55 USB controller 

0x036d | MCP55 USB 2.0 controller 
0x036e | MCP55 IDE controller 

0х036Ғ | MCP55 SATA [???] 

0x0370 | MCP55 PCI subtractive bridge 
0x0371 | MCP55 HDA 

0x0372 МСР55 ethernet controller (class 0200) 
0x0373 МСР55 ethernet controller (class 0680) 
0x0374 | MCP55 PCI-Express x1/x4 port #0 
0x0375 | МСР55 PCI-Express x1/x8 port 
0x0376 | MCP55 PCI-Express x8 port 
0x0377 | MCP55 PCI-Express x8/x16 port 
0x0378 | MCP55 PCI-Express x1/x4 port #1 
0x037e | МСР55 SATA controller [?] 
0х037Ғ | MCP55 SATA controller 


MCP61 


Standalone. 
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device id | product 

0x03e0 | MCP61 LPC bridge 

0х03е1 | MCP61 LPC bridge 

0x03e2 | MCP61 memory controller #0 
0x03e3 | MCP61 LPC bridge [?] 
0x03e4 | MCP61 HDA [?] 

0x03e5 MCP61 ethernet controller [?] 
0x03e6 MCP61 ethernet controller [?] 
0x03e7 MCP61 SATA controller [?] 
0x03e8 | MCP61 PCI-Express x16 port 
0x03e9 | MCP61 PCI-Express x1 port 
0x03ea | MCP61 memory controller #0 
0x03eb | MCP61 SMBus controller 
0x03ec | MCP61 IDE controller 
0x03ee | МСРбІ ethernet controller [?] 
0x03ef | МСРбІ ethernet controller (class 0680) 
0х03Ғ0 | MCP61 HDA 

0x03f1 MCP61 USB controller 
0х03#2 MCP61 USB 2.0 controller 
0х03Ғ3 | MCP61 PCI subtractive bridge 
0х03Ғ4 | MCP61 SMU 

0х03Ғ5 | MCP61 memory controller #12 
0x03f6 | MCP61 SATA controller 
0x03f7 MCP61 SATA controller [?] 


MCP65 
Standalone. 
device id | product 
0x0440 | MCP65 LPC bridge [?] 
0x0441 | MCP65 LPC bridge 
0x0442 | MCP65 LPC bridge 
0x0443 | MCP65 LPC bridge [?] 
0x0444 | MCP65 memory controller #0 
0x0445 | MCP65 memory controller #12 
0x0446 | MCP65 SMBus controller 
0x0447 | MCP65 SMU 
0x0448 | MCP65 IDE controller 
0x0449 | MCP65 PCI subtractive bridge 
0x044a | MCP65 HDA 
0x044b | MCP65 HDA [?] 
0х044с | MCP65 SATA controller (AHCI mode) [?] 
0х0444 | MCP65 SATA controller (AHCI mode) 
0x044e | MCP65 SATA controller (AHCI mode) [?] 
0x044f | MCP65 SATA controller (AHCI mode) [?] 
0x0450 | MCP65 ethernet controller (class 0200) 
0x0451 | MCP65 ethernet controller [?] 
0x0452 MCP65 ethernet controller (class 0680) 


Continued on next page 


70 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Table 6 — continued from previous page 


device id | product 

0х0453 | MCP65 ethernet controller [?] 

0x0454 | MCP65 USB controller #0 

0x0455 | MCP65 USB 2.0 controller #0 

0x0456 | MCP65 USB controller #1 

0x0457 | MCP65 USB 2.0 controller #1 

0x0458 | MCP65 PCI-Express x8/x16 port 

0x0459 | MCP65 PCI-Express x8 port 

0x045a | MCP65 PCI-Express x1/x2 port 

0x045b | MCP65 PCI-Express x2 port 

0x045c | MCP65 SATA controller (compatibility mode) [?] 
0х0454 | MCP65 SATA controller (compatibility mode) 
0x045e | MCP65 SATA controller (compatibility mode) [?] 
0x045f | MCP65 SATA controller (compatibility mode) [?] 


MCP67 


Standalone. 


device id | product 


0x0541 | MCP67 memory controller #12 
0x0542 MCP67 SMBus controller 

0x0543 | MCP67 SMU 

0x0547 | MCP67 memory controller #0 

0x0548 | MCP67 LPC bridge 

0x054c | MCP67 ethernet controller (class 0200) 
0х0544 | MCP67 ethernet controller [?] 

0х054е | MCP67 ethernet controller [?] 

0x054f | MCP67 ethernet controller [?] 


0x0550 | MCP67 SATA controller (compatibility mode) 
0x0551 MCP67 SATA controller (compatibility mode) [?] 
0x0552 MCP67 SATA controller (compatibility mode) [?] 
0x0553 | MCP67 SATA controller (compatibility mode) [?] 
0x0554 MCP67 SATA controller (AHCI mode) 
0x0555 MCP67 SATA controller (AHCI mode) [?] 
0x0556 | MCP67 SATA controller (AHCI mode) [?] 
0x0557 MCP67 SATA controller (AHCI mode) [?] 
0x0558 MCP67 SATA controller (AHCI mode) [?] 

21 

21 

[?] 


0x0559 | MCP67 SATA controller (AHCI mode) [? 
0x055a | MCP67 SATA controller (AHCI mode) [? 
0x055b | MCP67 SATA controller (AHCI mode) [? 
0x055c | MCP67 HDA 

0x055d | MCP67 HDA [?] 

0x055e | MCP67 USB controller 

0x055f | MCP67 USB 2.0 controller 

0x0560 MCP67 IDE controller 

0x0561 | MCP67 PCI subtractive bridge 

0x0562 | MCP67 PCI-Express x16 port 

0x0563 | MCP67 PCI-Express x1 port 
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C73 

Paired with MCP55. 
device id | product 
0x0800 | C73 host bridge 
0x0801 C73 host bridge [?] 
0x0802 C73 host bridge [?] 
0x0803 C73 host bridge [?] 
0x0804 C73 host bridge [?] 
0x0805 C73 host bridge [?] 
0x0806 C73 host bridge [?] 
0x0807 C73 host bridge [?] 
0x0808 | C73 memory controller #1 
0x0809 | C73 memory controller #2 
0x080a | C73 memory controller #3 
0x080b | C73 memory controller #4 
0x080c | C73 memory controller #5 
0x080d | C73 memory controller #6 
0х080е | C73 memory controller #7/#17 
0x080f | C73 memory controller #10 
0x0810 | C73 memory controller #11 
0x0811 | C73 memory controller #12 
0x0812 | C73 memory controller #13 
0x0813 | C73 memory controller #14 
0x0814 | C73 memory controller #15 
0x0815 | C73 PCI-Express x? port #0 
0x0817 | C73 PCI-Express x? port #1 
0х081а | C73 memory controller #16 

MCP73 

Standalone. 


device id | product 

0х056а | MCP73 USB 2.0 controller 

0х056с | MCP73 IDE controller 

0х0564 | MCP73 PCI subtractive bridge 

0х056е | MCP73 PCI-Express x16 port 

0х056Ғ | MCP73 PCI-Express x1 port 

0х07с0 | MCP73 host bridge 

0х07с1 | MCP73 host bridge 

0x07c2 | MCP73 host bridge [?] 

0x07c3 | MCP73 host bridge 

0x07c4 | MCP73 host bridge [?] 

0x07c5 | MCP73 host bridge 

0x07c6 | MCP73 host bridge [?] 

0x07c7 | MCP73 host bridge 

0x07c8 | MCP73 memory controller #34 

0x07cb | MCP73 memory controller #1 
Continued on next page 
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Table 7 — continued from previous page 


device id | product 
0х07са | MCP73 memory controller #10 
0х07се | MCP73 memory controller #11 
0х07сЁ | MCP73 memory controller #12 
0x07d0 | MCP73 memory controller #13 
0х0741 | MCP73 memory controller #14 
0x07d2 | MCP73 memory controller #15 
0x07d3 | MCP73 memory controller #16 
0x07d6 | MCP73 memory controller #20 
0x07d7 | MCP73 LPC bridge 
0х0748 | MCP73 SMBus controller 
0x07d9 | MCP73 memory controller #32 
0x07da | MCP73 SMU 
0х074с | MCP73 ethernet controller (class 0200) 
0хо7аа | MCP73 ethernet controller [?] 
0xO7de | MCP73 ethernet controller [?] 
0x0O7df | MCP73 ethernet controller [?] 
0x07£0 | MCP73 SATA controller (compatibility mode) 
0х07Ғ1 | MCP73 SATA controller (compatibility mode) [?] 
0x07f2 MCP73 SATA controller (compatibility mode) [?] 
0х07Ғ3 | MCP73 SATA controller (compatibility mode) [?] 
0х07Ғ4 | MCP73 SATA controller (AHCI mode) 
0x07f5 | MCP73 SATA controller (AHCI mode) [?] 
0x07f MCP73 SATA controller (AHCI mode) [?] 
0x07£f7 MCP73 SATA controller (AHCI mode) [?] 
0x07f MCP73 SATA controller (RAID mode) 
0х07Ғ9 | MCP73 SATA controller (КАП mode) [?] 
0x07fa | MCP73 SATA controller (RAID mode) [?] 
0x07fb | MCP73 SATA controller (RAID mode) [?] 
0x07fc | MCP73 HDA 
0х07ға | MCP73 HDA [?] 
0x07fe | MCP73 USB controller 
MCP77 
Standalone. 

device id product 

0x0568 MCP77 memory controller #14 

0x0569 MCP77 IGP bridge 

0х0570-0х057Ғ | MCP* ethernet controller (class 0200 alt) [XXX] 

0x0580-0x058f | MCP* SATA controller (alt ID) [XXX] 


0x0590-0x059f | MCP* HDA (alt ID) [XXX] 


0x05a0-0x05af | MCP* IDE (alt ID) [XXX] 


0x0 


751 


MCP77 memory controller #12 


0x0752 


MCP77 SMBus controller 


0x0753 


MCP77 SMU 


0x0 


754 


MCP77 memory controller #0 


0x0 


755 


MCP77 memory controller #0 [?] 
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MCP79 


Standalone. 


Table 8 — continued from previous page 


device id product 

0x0756 MCP77 memory controller #0 [?] 
0x0757 MCP77 memory controller #0 [?] 
0x0759 МСР77 IDE controller 

0x075a MCP77 PCI subtractive bridge 
0x075b MCP77 PCI-Express x1/x4 port 
0x075c MCP77 LPC bridge 

0x075d MCP77 LPC bridge 

0x075e MCP77 LPC bridge 

0x0760 MCPT77 ethernet controller (class 0200) 
0x0761 MCPT77 ethernet controller [?] 

0x0762 MCPT77 ethernet controller [?] 

0x0763 МСР77 ethernet controller [?] 

0x0764 MCPT77 ethernet controller (class 0680) 
0x0765 МСР77 ethernet controller [?] 

0x0766 MCPT77 ethernet controller [?] 

0x0767 MCPT77 ethernet controller [?] 

0x0774 MCP77 HDA 

0x0775 MCP77 HDA [?] 

0x0776 MCP77 HDA [?] 

0х0777 MCP77 HDA [?] 

0x0778 MCP77 PCI-Express 2.0 x8/x16 port 
0x0779 MCP77 PCI-Express 2.0 x8 port 
0x077a MCP77 PCI-Express x1 port 

0х077Ю MCP77 USB controller #0 

0x077c MCP77 USB 2.0 controller #0 
0х0774 МСР77 USB controller #1 

0x077e MCP77 USB 2.0 controller #1 
0x0ad0-0x0ad3 | MCP77 SATA controller (compatibility mode) 
0х0аа4-0х0аа7 | MCP77 SATA controller (AHCI mode) 
0х0аа8-0х0аар | MCP77 SATA controller (RAID mode) 


device id 


product 


0x0570-0x057f 


MCP* ethernet controller (class 0200 alt) [XXX] 


0x0580-0x058f 


MCP* SATA controller (alt ID) [XXX] 


0x0590-0x059f 


MCP* HDA (alt ID) [XXX] 


0x0a80 MCP79 host bridge 

0x0a81 MCP79 host bridge [?] 

0x0a82 MCP79 host bridge 

0x0a83 MCP79 host bridge 

0x0a84 MCP79 host bridge 

0x0a85 MCP79 host bridge [?] 

0x0a86 MCP79 host bridge 

0x0a87 MCP79 host bridge [?] 

0x0a88 MCP79 memory controller #1 
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Table 9 — continued from previous page 


device id product 

0x0a89 MCP79 memory controller #33 
0x0a8d MCP79 memory controller #13 
0x0a8e MCP79 memory controller #14 
0x0a8f MCP79 memory controller #15 
0x0a90 MCP79 memory controller #16 
0x0a94 MCP79 memory controller #23 
0x0a95 MCP79 memory controller #24 
0x0a98 MCP79 memory controller #34 
0x0aa0 МСР79 IGP bridge 

0x0aa2 MCP79 SMBus controller 
0x0aa3 MCP79 SMU 

0x0aa4 MCP79 memory controller #31 
0x0aa5 MCP79 USB controller #0 
0x0aa6 MCP79 USB 2.0 controller #0 
0x0aa7 MCP79 USB controller #1 
0x0aa8 MCP79 USB controller [?] 
0x0aa9 MCP79 USB 2.0 controller #1 
0x0aaa MCP79 USB 2.0 controller [?] 
0x0aab МСР79 PCI subtractive bridge 
0x0aac MCP79 LPC bridge 

0x0aad MCP79 LPC bridge 

0x0aae MCP79 LPC bridge 

0x0aaf MCP79 LPC bridge 

0x0abO0 MCP79 ethernet controller (class 0200) 
0x0ab1 MCP79 ethernet controller [?] 
0x0ab2 MCP79 ethernet controller [?] 
0x0ab3 MCP79 ethernet controller [?] 


0x0ab4-0x0ab7 


MCP79 SATA controller (compatibility mode) 


0x0ab8-0x0abb 


MCP79 SATA controller (AHCI mode) 


OxOabc-0x0abf 


MCP79 SATA controller (RAID mode) [XXX: actually OxOabO-Oxabb are accepted by hw without trickery] 


Ох0бас0 MCP79 HDA 
Ох0ас1 MCP79 HDA [?] 
0x0ac2 MCP79 HDA [?] 
0x0ac3 MCP79 HDA [?] 
0x0ac4 MCP79 PCI-Express 2.0 x16 port 
0x0ac5 MCP79 PCI-Express 2.0 x4/x8 port 
0x0ac6 MCP79 PCI-Express 2.0 x1/x4 port 
0x0ac7 MCP79 PCI-Express 2.0 x1 port 
0x0ac8 MCP79 PCI-Express 2.0 x4 port 
MCP89 
Standalone. 


2.3. nVidia PCI id database 75 


nVidia Hardware Documentation, Release git 


device id 


product 


x0580-0x058f 


MCP* SATA controller (alt ID) [XXX] 


x0590-0x059f 


MCP* HDA (alt ID) [XXX] 


x0d60 


MCPS89 host bridge 


x0d68 


MCP89 memory controller #1 


x0d69 


MCP89 memory controller #33 


x0d6d 


MCP89 memory controller #10 


х046е 


МСР89 memory controller #11 


х0абғ 


MCP89 memory controller #12 


х0470 


MCP89 memory controller #13 


х0471 MCP89 memory controller #20 
x0d72 MCP89 memory controller #21 
x0d75 MCP89 memory controller #110 
x0d76 MCPS89 IGP bridge 

x0d79 MCP89 SMBus controller 
x0d7a MCP89 SMU 


хбаты 


MCP89 memory controller #31 


х0474 


MCP389 ethernet controller (class 0200) 


x0d80 


MCP89 LPC bridge 


x0d84-0x0d87 


MCP89 SATA controller (compatibility mode) 


x0d88-0x0d8b 


MCP89 SATA controller (AHCI mode) 


x0d8c-0x0d8f 


MCP89 SATA controller (RAID mode) 


x0d94-0x0d97 


MCP89 HDA [XXX: actually 1-Oxf] 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 


x0d9a MCP89 PCI-Express x1 port #0 
x0d9b MCP89 PCI-Express x1 port #1 
x0d9c МСР89 USB controller 
x0d9d MCP89 USB 2.0 controller 


2.3.9 Tegra 


T20 


device id | product 


OxObfO | T20 PCI-Express x4 port 


OxObf1 | T20 PCI-Express x2 port 


T30 


device id | product 


OxOelc | T30 PCI-Express x4 port 


OxOeld | T30 PCI-Express x2 port 


T124 


Also known as Tegra K1. 
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device id | product 
0x0e12 | TI24 PCI-Express x4 port 
0x0e13 | TI24 PCI-Express х! port 


T210 


Also known as Tegra ХІ. 


device id | product 
OxOfae | T210 PCI-Express x4 port 
OxOfaf | T210 PCI-Express x1 port 


T186 


Also known as Tegra X2. 


device id | product 
0х10е5 | T186 PCI-Express x4 port 
0х10е6 | T186 PCI-Express x1 port 


2.4 PCI/PCIE/AGP bus interface and card management logic 


Contents: 


2.4.1 PCI BARs and other means of accessing the GPU 


Contents 


* PCI BARs and other means of accessing the GPU 
— Nvidia GPU ВАК», IO ports, and memory areas 
- PCI/PCIE configuration space 
- ВАКО: MMIO registers 
— BARI: VRAM aperture 
- ВАК2/ВАКЗ: RAMIN aperture 
— BAR2: NV3 indirect memory access 
- ВАК5: С80 indirect memory access 
— BARÓ: PCI ROM aperture 


— INTA: the card interrupt 


Legacy УСА IO ports and memory 
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Nvidia GPU BARs, IO ports, and memory areas 


The nvidia GPUs expose the following areas to the outside world through PCI: 


PCI configuration space / PCIE extended configuration space 


MMIO registers: BARO - memory, 0x1000000 bytes or more depending on card type 


VRAM aperture: BARI - memory, 0x1000000 bytes or more depending on card type [NV3+ only] 


indirect memory access IO ports: BAR2 - 0x100 bytes of IO port space [NV3 only] 

222: BAR2 [only NV1x IGPs?] 

???: BAR2 [only NV20?] 

RAMIN aperture: BAR2 or BAR3 - memory, 0x1000000 bytes or more depending on card type [МУ40+] 


indirect memory access IO ports: BARS - 0x80 bytes of IO port space [G80+] 

PCI ROM aperture 

PCI INTA interrupt line 

legacy VGA IO ports: 0x3b0-0x3bb and 0x3c0-Ox3df [can be disabled in PCI config] 
legacy VGA memory: 0xa0000-Oxbffff [can be disabled in PCI config] 


PCI/PCIE configuration space 


Nvidia GPUs, like all PCI devices, have PCI configuration space. Its contents are described in pci. 


BARO: MMIO registers 


This is the main control space of the card - all engines are controlled through it, and it contains alternate means to 
access most of the other spaces. This, along with the VRAM / RAMIN apertures, is everything that’s needed to fully 
control the cards. 


This space is a 16МВ area of memory sparsely populated with areas representing individual engines, which in turn 
are sparsely populated with registers. The list of engines depends on card type. While there are no known registers 
outside 16MB range, the BAR itself can have a larger size on NV40+ cards if configured so by straps. 


Its address is set up through PCI BAR 0. The BAR uses 32-bit addressing and is non-prefetchable memory. 


The registers inside this BAR are 32-bit, with the exception of areas that are aliases of the byte-oriented VGA legacy 
IO ports. They should be accessed through aligned 32-bit memory reads/writes. On pre-NVIA cards, the registers 
are always little endian, on NV1A+ cards endianness of the whole area can be selected by a switch in PMC. The 
endianness switch, however, only affects BARO accesses to the MMIO space - accesses from inside the card are 
always little-endian. 


A particularly important subarea of MMIO space is PMC, the card’s master control. This subarea is present on all 
nvidia GPUs at addresses 0x000000 through ОхОООНТ, It contains GPU id information, Big Red Switches for engines 
that can be turned off, and master interrupt control. It’s described in more detail in pmc. 


For full list of MMIO areas, see mmio. 
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BAR1: VRAM aperture 


This is an area of prefetchable memory that maps to the card's VRAM. On native PCIE cards, it uses 64-bit addressing, 
on native PCI/AGP ones it uses 32-bit addressing. 


On non- TURBOCACHE pre-G80 cards and on С80-- cards with BAR1 VM disabled, BAR addresses map directly to 
УКАМ addresses. On TURBOCACHE cards, BARI is made of controllable УКАМ and GART windows [see NV44 
host memory interface]. G80+ cards have a mode where all BAR references go through the card's VM subsystem, see 
g80-host-mem and gf100-host-mem. 


On NV3 cards, this BAR also contains RAMIN access aperture at address Охс00000 [see NV3 УКАМ structure and 
usage] 


Todo: map out the BAR fully 


the BAR size depends on card type: 

NV3: 16MB [with RAMIN] 

NV4: 16MB 

NV5: 32MB 

NV10:NV17: 128МВ 

NV17:G80: 64MB-512MB, set via straps 
» G80-: 64MB-64GB, set via straps 


Note that BAR size is independent from actual VRAM size, although on pre-NV30 cards the BAR is guaranteed not 
to be smaller than VRAM. This means it may be impossible to map all of the card’s memory through the BAR on 
NV30+ cards. 


BAR2/BAR3: RAMIN aperture 


RAMIN is, on pre-G80 cards, a special area at the end of VRAM that contains various control structures. RAMIN 
starts from end of VRAM and the addresses go in reverse direction, thus it needs a special mapping to access it the 
way it'll be used. While pre-NV40 cards limitted its size to 1MB and could fit the mapping in BARO, or BARI for 
NV3, NV40+ allow much bigger RAMIN addresses. RAMIN BAR provides such RAMIN mapping on NV40 family 
cards. 


G80 did away with a special RAMIN area, but it kept the BAR around. It works like BARI, but is independent on 
it and can use a distinct VM DMA object. As opposed to BARI, all accesses done to BAR3 will be automatically 
byte-swapped in 32-bit chunks like BARO if the big-endian switch is on. It’s commonly used to map control structures 
for kernel use, while BAR1 is used to map user-accessible memory. 


The BAR uses 64-bit addressing on native PCIE cards, 32-bit addressing on native PCI/AGP. It uses BAR2 slot on 
native PCIE, BAR3 on native PCI/AGP. It is non-prefetchable memory on cards up to and including G200, prefetchable 
memory on MCP77+. The size 18 at least 16МВ and is set via straps. 


ВАН2: NV3 indirect memory access 


An area of IO ports used to access BARO or BARI indirectly by real mode code that cannot map high memory 
addresses. Present only on NV3. 


Todo: RE it. or not. 
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BAR5: G80 indirect memory access 


An area of IO ports used to access BARO, BARI, and BAR3 indirectly by real mode code that cannot map high 
memory addresses. Present on G80- cards. On earlier cards, the indirect access feature of VGA IO ports can be used 
instead. This BAR can also be disabled via straps. 


Todo: It's present on some МҮ4х 


This area is 0x80 bytes of IO ports, but only first 0x20 bytes are actually used; the rest are empty. The ports are all 
treated as 32-bit ports. They are: 


ВАК5-0х00: when read, signature: 0x2469fdb9. When written, master enable: write 1 to enable remaining ports, 0 
to disable. Only bit O of the written value is taken into account. When remaining ports are disabled, they read 
as Ох НТ, 


ВАК5-0х04: enable. if bit 0 is 1, the “data” ports are active, otherwise they're inactive and merely store the last 
written value. 


ВАК5-0х08: BARO address port. bits 0-1 and 24-31 are ignored. 


BAR5+0x0c: BARO data port. Reads and writes are translated to BARO reads and writes at address specified by 
BARO address port. 


ВАК5-0х10: BARI address port. bits 0-1 are ignored. 


ВАК5-0х14: BARI data port. Reads and writes are translated to BARI reads and writes at address specified by 
BARI address port. 


ВАК5-0х18: BAR3 address port. bits 0-1 and 24-31 are ignored. 


ВАК5-0х1с: BAR3 data port. Reads and writes are translated to BAR3 reads and writes at address specified by 
BAR3 address port. 


BARO addresses are masked to low 24 bits, allowing access to exactly 16MB of MMIO space. The BARI addresses 
aren't masked, and the window actually allows access to more BAR space than the BARI itself - up to 4GB of VRAM 
or VM space can be accessed this way. BAR3 addresses, on the other hand, are masked to low 24 bits even though the 
real BAR3 is larger. 


BAR6: PCI ROM aperture 


Todo: figure out size 


Todo: figure out NV3 


Todo: verify G80 


The nvidia GPUs expose their BIOS as standard PCI ROM. The exposed ROM aliases either the actual BIOS EEP- 
КОМ, or the shadow BIOS in УКАМ. This setting is exposed in PCI config space. If the “shadow enabled" PCI config 
register is 0, the PROM MMIO area is enabled, and both PROM and the PCI ROM aperture will access the EEPROM. 
Disabling the shadowing has a side effect of disabling video output on pre-G80 cards. If shadow is enabled, EEPROM 
is disabled, PROM reads will return garbage, and PCI ROM aperture will access the VRAM shadow copy of BIOS. 
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On pre-G80 cards, the shadow BIOS is located at address 0 of RAMIN, on G80+ cards the shadow bios is pointed to 
by PDISPLAY.VGA.ROM. WINDOW register - see g80-vga for details. 


INTA: the card interrupt 


Todo: MSI 


The GPU reports all interrupts through the PCI INTA line. The interrupt enable and status registers are located in PMC 
area - see pmc-intr. 


Legacy VGA IO ports and memory 


The nvidia GPU cards are backwards compatible with VGA and expose the usual VGA ranges: IO ports 0x3b0-0x3bb 
and 0x3c0-Ox3df, memory at 0xa0000-Oxbffff. The VGA ranges can however be disabled in PCI config space. The 
УСА registers and memory are still accessible through their aliases in BARO, and disabling the legacy ranges has no 
effect on the operation of the card. The IO range contains an extra top-level register that allows indirect access to 
the MMIO area for use by real mode code, as well as many nvidia-specific extra registers in the VGA subunits. For 
details, see nv3-vga. 


2.5 Power, thermal, and clock management 


Contents: 


2.5.1 Clock management 


The nvidia GPUs, like most electronic devices, use clock signals to control their operation. Since they're complicated 
devices made of many subunits with different performance needs, there are multiple clock signals for various parts of 
the GPU. 


The set of available clocks and the method of setting them varies a lot with the card type. 


Contents: 


2.5.2 PDAEMON: card management microprocesor 


Contents: 


falcon parameters 


Present on: 
v0: GT215:MCP89 
v1: MCP89:GF100 
v2: GF100:GF119 
v3: GF119:GK104 
v4: GK104:GK110 
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v5: GK110:GK208 
v6: GK208:GM107 
v7: GM107+ 
BARO address: 0x10a000 
PMC interrupt line: 
у0-у1: 18 
v24: 24 
PMC enable bit: 
У0-у1: none, use reg 0x22210 instead 
v24: 13 
Version: 
v0-v2: 3 
v3,v4: 4 
v5: 4.1 
v6,v7: 5 
Code segment size: 
v0: 0x4000 
v1:v7: 0x6000 
v7: 0x8000 
Data segment size: 
v0: 0x3000 
у1-: 0x6000 
Fifo size: 
v0-v1: 0x10 
v2: 3 
Xfer slots: 
v0-v2: 8 
v3-v4: 0x10 
Secretful: 
v0:v7: no 
v7: yes 
Code TLB index bits: 
v0-v2: 8 
v3+: 9 
Code ports: 1 
Data ports: 4 
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Version 4 unknown caps: 31, 27 
Unified address space: yes [on v3+] 
IO addressing type: 
v0-v2: indexed 
v3-v7: simple 
Core clock: 
У0-у1: gt215-clock-dclk 
ү2-У7: gfl00-clock-dclk 
Tesla VM engine: Охе 
Tesla VM client: Ox11 
Tesla context DMA: [none] 
Fermi VM engine: 0x17 
Fermi VM client: HUB 0x12 


Interrupts: 
Line | Type | Presenton | Name Description 
8 edge | GT215:GF100| MEMIF PORT INVALIID MEMIF port not initialised 
9 edge | GT215:GF100| MEMIF FAULT MEMIF VM fault 
9 edge | GF100- MEMIF BREAK MEMIF breakpoint 
10 level | all PMC_DAEMON PMC interrupts routed directly to PDAEMON 
11 level | all SUBINTR second-level interrupt 
12 level | all THERM PTHERM subinterrupts routed to PDAEMON 
13 level | all SIGNAL input signal rise/fall interrupts 
14 level | all TIMER the timer interrupt 
15 level | all IREDIR. РМС PMC interrupts redirected to PDAEMON by 
IREDIR 
Status bits: 


Bit | Present on Name Description 

0 all FALCON Falcon unit 

1 ай EPWR GRAPH | PGRAPH engine power gating 

2 all EPWR  VDEC video decoding engine power gating 
3 all MEMIF Memory interface 

4 GT215:MCP89 GF100- | USER User controlled 

4 MCP89:GF100 EPWR_VCOMP | PVCOMP engine power gating 

2 MCP89:GF100 USER User controlled 


IO registers: pdaemon-io 


PCOUNTER signals 


Todo: write me 
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Todo: discuss mismatched clock thing 


• 22? 

* IREDIR STATUS 

* IREDIR HOST REQ 

* IREDIR TRIGGER DAEMON 
* IREDIR TRIGGER HOST 
* IREDIR PMC 

e ІКЕРІК INTR 

* MMIO BUSY 

* MMIO IDLE 

* MMIO DISABLED 

* TOKEN ALL USED 

* TOKEN NONE USED 

* TOKEN FREE 

* TOKEN ALLOC 

* FIFO PUT 0 WRITE 

* FIFO PUT 1 WRITE 

* FIFO PUT 2 WRITE 

* FIFO PUT 3 WRITE 

* INPUT CHANGE 
OUTPUT 2 

INPUT 2 

THERM ACCESS BUSY 


Todo: figure out the first signal 


Todo: document MMIO 7 signals 


Todo: document INPUT *, OUTPUT * 


Second-level interrupts 


Because falcon has space for only 8 engine interrupts and PDAEMON needs many more, a second-level interrupt 
register was introduced: 


ММПО 0x688 / I[0x1a200]: SUBINTR 
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bit 0: H2D - host to PDAEMON scratch register written 
bit 1: FIFO - host to PDAEMON fifo pointer updated 
bit 2: ЕРУУК GRAPH - PGRAPH engine power control 


bit 3: EPWR, VDEC - video decoding engine power control 

bit 4: MMIO - indirect MMIO access error 

bit 5: IREDIR ERR - interrupt redirection error 

bit 6: IREDIR, HOST КЕО - interrupt redirection request 

bit 7: ??? 

bit 8: ??? - goes to 0x670 

bit 9: EPWR_VCOMP [MCP89] - PVCOMP engine power control 
bit 13: ??? [GF119-] - goes to 0x888 


Todo: figure out bits 7, 8 


Todo: more bits in 10-12? 


The second-level interrupts are merged into a single level-triggered interrupt and delivered to falcon interrupt line 11. 
This line is asserted whenever any bit of SUBINTR register is поп-0. A given SUBINTR bit is set to 1 whenever the 
input second-level interrupt line is 1, but will not auto-clear when the input line goes back to 0 - only writing 1 to 
that bit in SUBINTR will clear it. This effectively means that SUBINTR bits have to be cleared after the downstream 
interrupt. Note that SUBINTR has no corresponding enable bit - if an interrupt needs to be disabled, software should 
use the enable registers corresponding to individual second-level interrupts instead. 


Note that IREDIR, HOST REQ interrupt has special semantics when cleared - see IREDIR. TRIGGER documenta- 
tion. 


User busy indication 


To enable the microcode to set the “РРАЕМОХ is busy" flag without actually making any PDAEMON subunit perform 
computation, bit 4 of the falcon status register is connected to a dummy unit whose busy status is controlled directly 
by the user: 


MMIO 0x420 / I[0x10800]: USER, BUSY Read/write, only bit 0 is valid. If set, falcon status line 4 or 5 [USER] is 
set to 1 [busy], otherwise it's set to O [idle]. 


Todo: what could possibly use РОАЕМОХ 5 busy status? 


Host <-> PDAEMON communication 


Contents 


* Host «-» PDAEMON communication 
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- Introduction 

- Submitting data to PDAEMON: FIFO 

- Submitting data to host: RFIFO 

— Host to PDAEMON scratch register: H2D 


— PDAEMON to host scratch register: D2H 
— Scratch registers: DSCRATCH 


Introduction 


There are 4 PDAEMON-specific channels that can be used for communication between the host and PDAEMON: 


FIFO: data submission from host to PDAEMON on 4 independent FIFOs in data segment, with interrupts 
generated whenever the PUT register is written 


RFIFO: data submission from PDAEMON to host on through a FIFO in data segment 


H2D: a single scratch register for host -» PDAEMON communication, with interrupts generated whenever it's 
written 


D2H: a single scratch register for PDAEMON -> host communication 
DSCRATCH: 4 scratch registers 


Submitting data to PDAEMON: FIFO 


These registers are meant to be used for submitting data from host to PDAEMON. The PUT register is FIFO head, 
written by host, and GET register is FIFO tail, written by PDAEMON. Interrupts can be generated whenever the PUT 
register is written. How exactly the data buffer works is software's business. Note that due to very limitted special 
semantics for FIFO uage, these registers may as well be used as [possibly interruptful] scratch registers. 


MMIO 0x4a0+i*4 / 110х12800-4“0х100), і<4: ТЕО РОТ] The FIFO head pointer, effectively a 32-bit scratch 
register. Writing it causes bit i of FIFO INTR to be set. 


MMIO 0x4b0+i*4 / I[0x12c00--i*0x100], і<4: FIFO GETT[i] The FIFO tail pointer, effectively a 32-bit scratch reg- 
Ister. 


MMIO 0x4c0 / 1(0х130001: FIFO INTR The status register for FIFO PUT write interrupts. Write a bit with 1 to 
clear it. Whenever a bit is set both in FIFO INTR and FIFO INTR EN, the FIFO [#1] second-level interrupt 
line to SUBINTR is asserted. Bit i corresponds to FIFO #i, and only bits 0-3 are valid. 


MMIO 0х4с4 / I[0x13100]: FIFO INTR EN The enable register for FIFO PUT write interrupts. Read/write, only 
4 low bits are valid. Bit assignment is the same as in FIFO. INTR. 


In addition, the FIFO circuitry exports four signals to PCOUNTER: 
* FIFO PUT 0 WRITE: pulses for one cycle whenever FIFO PUT 
* FIFO PUT 1 WRITE: pulses for one cycle whenever FIFO PUT 
* FIFO PUT 2 WRITE: pulses for one cycle whenever FIFO PUT 
* FIFO PUT 3 WRITE: pulses for one cycle whenever FIFO PUT 


0] is written 
1118 written 


2] is written 


=з om Ірге =з 


] 
] 
] 
] 


3] is written 
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Submitting data to host: RFIFO 


The RFIFO is like one of the 4 FIFOs, except it's supposed to go from PDAEMON to the host and doesn't have the 
interupt generation powers. 


MMIO 0x4c8 / I[0x13200]: RFIFO PUT MMIO Ох4сс / 1[0x13300]: КЕІЕО GET 
The RFIFO head and tail pointers. Both are effectively 32-bit scratch registers. 


Host to PDAEMON scratch register: H2D 


H2D is a scratch register supposed to be written by the host and read by PDAEMON. It generates an interrupt when 
written. 


MMIO 0х440 / 1[0x13400]: H2D A 32-bit scratch register. Sets H2D. INTR when written. 


ММПО 0x4d4 / 1[0x13500]: H2D INTR Тһе status register for H2D write interrupt. Only bit 0 is valid. Set when 
H2D register is written, cleared when 1 is written to bit 0. When this and H2D INTR EN are both set, the H2D 
[40] second-level interrupt line to SUBINTR is asserted. 


MMIO 0х448 / I[0x13600]: H2D INTR EN The enable register for H2D write interrupt. Only bit 0 is valid. 


PDAEMON to host scratch register: D2H 


D2H is just a scratch register supposed to be written by PDAEMON and read by the host. It has no interrupt genration 
powers. 


MMIO 0x4dc / 1[0x13700]: D2H A 32-bit scratch register. 


Scratch registers: DSCRATCH 


DSCRATCH[] are just 4 32-bit scratch registers usable for PDAEMON<->HOST communication or any other pur- 
poses. 


MMIO 0x5d0+i*4 / 1[0x17400+i*0x100], 1-4: DSCRATCH[i] A 32-bit scratch register. 


Hardware mutexes 


The PDAEMON has hardware support for 16 busy-waiting mutexes accessed by up to 254 clients simultanously. The 
clients may be anything able to read and write the PDAEMON registers - code running on host, on PDAEMON, or on 
any other falcon engine with MMIO access powers. 


The clients are identified by tokens. Tokens are 8-bit numbers in 0x01-Oxfe range. Tokens may be assigned to clients 
statically by software, or dynamically by hardware. Only tokens 0x08-Oxfe will be dynamically allocated by hardware 
- software may use statically assigned tokens 0x01-0x07 even if dynamic tokens are in use at the same time. The 
registers used for dynamic token allocation are: 


MMIO 0x488 / 110х12200|: TOKEN ALLOC Read-only, each read to this register allocates a free token and gives 
it as the read result. If there are no free tokens, Oxff is returned. 


MMIO 0x48c / 1[0x12300]: TOKEN FREE А write to this register will free a token, ie. return it back to the pool 
used by TOKEN ALLOC. Only low 8 bits of the written value are used. Attempting to free a token outside of 
the dynamic allocation range [0x08-Oxff] or a token already in the free queue will have no effect. Reading this 
register will show the last written value, invalid or not. 
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The free tokens are stored in a FIFO - the freed tokens will be used by TOKEN ALLOC in the order of freeing. After 
reset, the free token FIFO will contain tokens 0x08-Oxfe in ascending order. 


The actual mutex locking and unlocking is done by the MUTEX TOKEN registers: 


MMIO 0x580+i*4 / I[0x16000+i*0x100], 116: MUTEX TOKEN[i] The 16 mutices. A value of 0 means un- 
locked, any other value means locked by the client holding the corresponding token. Only low 8 bits of the 
written value are used. A write of O will unlock the mutex and will always succeed. A write of 0x01-Oxfe will 
succeed only if the mutex is currently unlocked. A write of Oxff is invalid and will always fail. A failed write 
has no effect. 


The token allocation circuitry additionally exports four signals to PCOUNTER: 


* TOKEN ALL USED: 1 iff all tokens are currently allocated Пе. a read from ТОКЕМ ALLOC would return 
Oxff] 


e ТОКЕМ МОМЕ USED: 1 iff no tokens are currently allocated їе. tokens Ox08-Oxfe are all in free tokens 
queue] 


* TOKEN FREE: pulses for 1 cycle whenever TOKEN FREE is written, even if with invalid value 


* TOKEN ALLOC: pulses for 1 cycle whenever ТОКЕМ ALLOC is read, even if allocation fails 


CRC computation 


The PDAEMON has a very simple CRC accelerator. Specifically, it can perform the CRC accumulation operation 
on 32-bit chunks using the standard CRC-32 polynomial of Oxedb88320. The current CRC residual is stored in the 
СКС STATE register: 


MMIO 0x494 / I[0x12500]: CRC STATE The current CRC residual. Read/write. 
And the data to add to the СКС is written to the СКС DATA register: 


MMIO 0x490 / I[0x12400]: СКС DATA When written, appends the 32-bit LE value to the running CRC residual 
in СКС STATE. When read, returns the last value written. Write operation: 


CRC STATE ^- value; 
for (i = 0; i < 32; i++) { 
if (CRC_STATE & 1) { 
CRC_STATE >>= 1; 
СЕС STATE ^= 0xedb88320; 
) else { 
CRC STATE >>- 1; 


} 
} 


To compute a CRC: 
1. Write the initial CRC residue to CRC_STATE 
2. Write all data to CRC_DATA, in 32-bit chunks 
3. Read CRC_STATE, xor its value with the final constant, use that as the CRC. 


If the data block to CRC has size that is not a multiple of 32 bits, the extra bits at the end [or the beginning] have to be 
handled manually. 
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The timer 


Aside from the usual falcon timers, PDAEMON has its own timer. The timer can be configured as either one-shot or 
periodic, can run on either daemon clock or PTIMER clock divided by 64, and generates interrupts. The following 
registers deal with the timer: 


MMIO 0x4e0 / I[0x13800]: TIMER START Тһе 32-bit count the timer starts counting down from. Read/write. 
For periodic mode, the period will be equal to TIMER. START- 1 source cycles. 


MMIO 0x4e4/1[0x13900]: TIMER TIME The current value of the timer, read only. If 
TIMER_CONTROL.RUNNING is set, this will decrease by 1 on every rising edge of the source clock. 
If such rising edge causes this register to become 0, the TIMER_INTR bit 8 [TIMER] is set. The behavior 
of rising edge when this register is already 0 depends on the timer mode: in ONESHOT mode, nothing will 
happen. In PERIODIC mode, the timer will be reset to the value from TIMER_START. Note that interrupts 
won't be generated if the timer becomes 0 when copying the value from TIMER START, whether caused 
by starting the timer or beginning a new PERIODIC period. This means that using PERIODIC mode with 
TIMER. START of 0 will never generate any interrupts. 


MMIO 0x4e8 / I[0x13a00]: TIMER СТКІ, 


e bit 0: RUNNING - when 0, the timer is stopped, when 1, the timer is runinng. Setting this bit to 1 when it 
was previously 0 will also copy the TIMER, START value to TIMER TIME. 


* bit 4: SOURCE - selects the source clock 
- 0: DCLK - daemon clock, effectively timer decrements by | on every daemon cycle 


- 1: PTIMER B5 - PTIMER time bit 5 [ie. bit 10 of TIME LOW]. Since timer decrements by 1 on 
every rising edge of the clock, this effectively decrements the counter on every 64th PTIMER clock. 


* bit 8: MODE - selects the timer mode 
— 0: ONESHOT - timer will halt after reaching 0 
— 1: PERIODIC - timer will restart from TIMER, START after reaching 0 
ММІО 0x680 / I[0x1a000]: TIMER INTR 


• bit 8: TIMER - set whenever TIMER TIME becomes 0 except by a copy from TIMER START, write 1 
to this bit to clear it. When this and bit 8 of TIMER, INTR EN are set at the same time, falcon interrupt 
line #14 [TIMER] is asserted. 


MMIO 0х684 / I[0x1a100]: TIMER INTR EN 


* bit 8: TIMER - when set, timer interupt delivery to falcon interrupt line 14 is enabled. 


Channel switching 


Todo: write me 


PMC interrupt redirection 


One of PDAEMON powers is redirecting the PMC INTR. HOST interrupt to itself. The redirection hw may be in one 
of two states: 


e HOST: PMC INTR. HOST output connected to PCI interrupt line [ORed with PMC INTR. NRHOST output], 
PDAEMON falcon interrupt #15 disconnected and forced to 0 
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e DAEMON: PMC INTR, HOST output connected to PDAEMON falcon interrupt #15 ПКЕРІК РМС], PCI 
interrupt line connected to INTR. NRHOST output only 


In addition, there's a capability enabling host to send “please turn redirect status back to HOST” interrupt with a 
timeout mechanism that will execute the request in hardware if the PDAEMON fails to respond to the interrupt in a 
given time. 


Note that, as a side effect of having this circuitry, PMC INTR. HOST line will be delivered nowhere [falcon interrupt 
line #15 will be 0, PCI interrupt line will be connected to INTR NRHOST only] whenever the IREDIR circuitry is 
in reset state, due to either whole PDAEMON reset through PMC.ENABLE / PDAEMON ENABLE or DAEMON 
circuitry reset via SUBENGINE, RESET with DAEMON set in the reset mask. 


The redirection state may be read at: 


MMIO 0x690 / I[0x1a400]: IREDIR, STATUS Read-only. Reads as 0 if redirect hw is in HOST state, 1 if it's in 
DAEMON state. 


The redirection state may be controlled by: 
MMIO 0x68c / I[0x1a300]: IREDIR TRIGGER This register is write-only. 


• ри 0: HOST КЕО- when written as 1, sends the “request redirect state change to HOST” interrupt, setting 
SUBINTR bit #6 [IREDIR. HOST REQ| to 1 and starting the timeout, if enabled. When written as 1 while 
redirect hw is already in HOST state, will just cause HOST REQ REDUNDANT error instead. 


* bit 4: DAEMON - when written as 1, sets the redirect hw state to DAEMON. If it was set to DAEMON 
already, causes DAEMON. REDUNDANT error. 


e bit 12: HOST - when written as 1, sets the redirect hw state to HOST. If it was set to HOST already, causes 
HOST. REDUNDANT error. Does not clear ІКЕ”ІК HOST. REQ interrupt bit. 


Writing a value with multiple bits set is not a good idea - one of them will cause an error. 


The IREDIR. HOST REQ interrupt state should be cleared by writing 1 to the corresponding SUBINTR bit. Once 
this is done, the timeout counting stops, and redirect hw goes to HOST state if it wasn't already. 


The IREDIR. HOST REQ timeout is controlled by the following registers: 
MMIO 0х694 / I[0x1a500]: IREDIR_TIMEOUT The timeout duration in daemon cycles. Read/write, 32-bit. 


MMIO 0х6а4 / Ц0х1а900]: IREDIR. TIMEOUT ENABLE The timeout enable. Only bit 0 is valid. When set to 
0, timeout mechanism is disabled, when set to 1, it's active. Read/write. 


When timeout mechanism is enabled and IREDIR. HOST REQ interupt is triggered, a hidden counter starts counting 
down. If IREDIR. TIMEOUT cycles go by without the interrupt being acked, the redirect hw goes to HOST state, the 
interrupt is cleared, and HOST КЕО TIMEOUT error is triggered. 


The redirect hw errors will trigger the IREDIR_ERR interrupt, which is connected to SUBINTR bit #5. The registers 
involved are: 


MMIO 0x698 / Ц0х1а600]: IREDIR. ERR DETAIL Read-only, shows detailed error status. АП bits are auto- 
cleared when IREDIR. ERR INTR is cleared 


bit 0: HOST REQ TIMEOUT - set when the IREDIR. HOST REQ interrupt times out 


bit 4: HOST REQ REDUNDANT - set when HOST REQ is poked in IREDIR. TRIGGER while the hw 
is already in HOST state 


bit 12: DAEMON REDUNDANT - set when HOST is poked in IREDIR. TRIGGER while the hw is 
already in DAEMON state 


bit 12: HOST REDUNDANT - set when HOST is poked in IREDIR, TRIGGER while the hw is already 
in HOST state 
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MMIO 0x69c / I[0x1a700]: IREDIR. ERR INTR The status register for IREDIR, ERR interrupt. Only bit 0 is 
valid. Set when any of the 4 errors happens, cleared [along with all IREDIR ERR. DETAIL bits] when 1 is 
written to bit 0. When this and IREDIR, ЕЕЕ INTR EN are both set, the IREDIR, ERR [#5] second-level 
interrupt line to SUBINTR is asserted. 


MMIO 0х6а0 / I[0x1a800]: IREDIR. ERR INTR EN The enable register for ІВЕРІВ ERR interrupt. Only bit 0 
18 valid. 


The interrupt redirection circuitry also exports the following signals to PCOUNTER: 
IREDIR, STATUS: current redirect hw status, like the IREDIR, STATUS reg. 
IREDIR HOST REQ: 1 if the IREDIR. HOST REQ [SUBINTR #6] interrupt is pending 


IREDIR, TRIGGER, DAEMON: pulses for 1 cycle whenever INTR, TRIGGER.DAEMON is written as 1, 
whether it results in an error or not 


IREDIR, TRIGGER, HOST: pulses for 1 cycle whenever INTR. TRIGGER.HOST is written as 1, whether it 
results in an error or not 


IREDIR РМС: 1 if the PMC INTR HOST line is active and directed to DAEMON Пе. mirrors falcon interrupt 
$15 input] 


IREDIR INTR: 1 if any IREDIR interrupt is active - IREDIR, HOST. REQ, IREDIR ERR, or IREDIR, РМС. 
IREDIR ERR does not count if IREDIR, ERR, INTR EN is not set. 


PTHERM interface 


PDAEMON can access all PTHERM registers directly, without having to go through the generic MMIO access func- 
tionality. The THERM range in the PDAEMON register space is mapped straight to PPHERM MMIO register range. 


On GT215:GF119, PTHERM registers are mapped into the I[] space at addresses 0x20000:0x40000, with addresses 
being shifted left by 6 bits wrt their address in PTHERM - PTHERM register 0х20000--х would be visible at Ц0х20000 
+ x * 0x40] by falcon, or at 0х10а800+х in MMIO [assuming it wouldn't fall into the reserved Ох 10afe0:0x 105000 
range]. On GF119+, the РТНЕКМ registers are instead mapped into the ІП space at addresses Ох1000:0х 1800, without 
shifting - PTHERM reg 0x20000- is visible at ЦОх1000+х]. On GF119+, the alias area is not visible via MMIO [just 
access PTHERM registers directly instead]. 


Reads to the PTHERM-mapped area will always perform 32-bit reads to the corresponding PTHERM regs. Writes, 
however, have their byte enable mask controlled via a PDAEMON register, enabling writes with sizes other than 
32-bit: 


MMIO 0x5f4 / I[0x17d00]: ТНЕКМ BYTE MASK Read/write, only low 4 bits are valid, initialised to Oxf on 
reset. Selects the byte mask to use when writing the THERM range. Bit i corresponds to bits i*8..1*8+7 of the 
written 32-bit value. 


The PTHERM access circuitry also exports a signal to PCOUNTER: 


* THERM ACCESS BUSY: 1 while a THERM range read/write is in progress - will light up for a dozen or so 
cycles per access, depending on relative clock speeds. 


In addition to direct register access to РТНЕКМ, PDAEMON also has direct access to PTHERM interrupts - falcon 
interrupt #12 [THERM] comes from PTHERM interrupt aggregator. PTHERM subinterrupts can be individually 
assigned for PMC or PDAEMON delivery - see ptherm-intr for more information. 


Idle counters 
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Contents 


* Idle counters 


— Introduction 


- MMIO Registers 


Introduction 


PDAEMON’s role is mostly about power management. One of the most effective way of lowering the power con- 
sumption is to lower the voltage at which the processor is powered at. Lowering the voltage is also likely to require 
lowering the clocks of the engines powered by this power domain. Lowering the clocks lowers the performance which 
means it can only be done to engines that are under-utilized. This technique is called Dynamic Voltage/Frequency 
Scaling (DVFS) and requires being able to read the activity-level/business of the engines clocked with every clock 
domains. 


PDAEMON could use PCOUNTER to read the business of the engines it needs to reclock but that would be a waste 
of counters. Indeed, contrarily to PCOUNTER that needs to be able to count events, the business of an engine can be 
polled at any frequency depending on the level of accuracy wanted. Moreover, doing the configuration of PCOUNTER 
both in the host driver and in PDAEMON would likely require some un-wanted synchronization. 


This is most likely why some counters were added to PDAEMON. Those counters are polling idle signals coming 
from the monitored engines. A signal is a binary value that equals 1 when the associated engine is idle, and O if it is 
active. 


Todo: check the frequency at which PDAEMON is polling 


MMIO Registers 


On GT215:GF100, there were 4 counters while on GF100-, there are 8 of them. Each counter is composed of 3 
registers, the mask, the mode and the actual count. There are two counting modes, the first one is to increment the 
counter every time every bit of COUNTER SIGNALS selected by the mask are set. The other mode only increments 
when all the selected bits are cleared. It is possible to set both modes at the same time which results in incrementing 
at every clock cycle. This mode is interesting because it allows dedicating a counter to time-keeping which eases 
translating the other counters' values to an idling percentage. This allows for aperiodical polling on the counters 
without needing to store the last polling time. 


The counters are not double-buffered and are independent. This means every counters need to be read then reset at 
roughly the same time if synchronization between the counters is required. Resetting the counter is done by setting bit 
31 of COUNTER COUNT. 


MMIO 0x500 / 1[0x14000]: COUNTER SIGNALS Read-only. Bitfield with each bit indicating the instantenous 
state of the associated engines/blocks. When the bit is set, the engine/block is idle, when it is cleared, the 
engine/block is active. 


e bit 0: СК ЮГЕ 

• bit 4: PVLD IDLE 

* bit 5: PPDEC IDLE 

• bit 6: РРРР IDLE 

* bit 7: MC IDLE [GF100-] 
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* bit 8: MC IDLE [GT215:GF100] 
* bit 19: PCOPYO IDLE 

e bit 20: PCOPYI IDLE [GF100-] 
* bit 21: PCOPY2 IDLE [GK104-] 


MMIO 0x504+i*10 / [0х14100+1*0х400]: COUNTER MASK The mask that will be applied оп 
COUNTER SIGNALS before applying the logic set by COUNTER. MODE. 


MMIO 0x508+i*10 / [0х14100+1*0х400]: COUNTER COUNT 
* bit 0-30: COUNT 
e bit 31: CLEAR, TRIGGER : Write-only, resets the counter. 
MMIO 0x50c+i*10 / [0х14300+1*0х400]: COUNTER MODE 
• bit 0: INCR IF ALL : Increment the counter if all the masked bits are set 
* bit 1: INCR IF NOT ALL : Increment the counter if all the masked bits are cleared 
e bit 2: UNK2 [GF119-] 


General MMIO register access 


PDAEMON can access the whole MMIO range through the IO space. 


To read from a MMIO address, poke the address into MMIO ADDR then trigger a read by poking Ox100fl to 
MMIO СТВІ. Wait for MMIO_CTRL’s bits 12-14 to be cleared then read the value from MMIO VALUE. 


To write to а MMIO address, poke the address into MMIO АРОК, poke the value to be written into MMIO VALUE 
then trigger a write by poking 0x100f2 to MMIO СТКІ. Wait for MMIO_CTRL’s bits 12-14 to be cleared if you want 
to make sure the write has been completed. 


Accessing an unexisting address will set MMIO_CTRL’s bit 13 after MMIO TIMEOUT cycles have passed. 


GF119 introduced the possibility to choose from which access point should the MMIO request be sent. ROOT сап 
access everything, IBUS accesses everything minus PMC, PBUS, PFIFO, PPCI and a few other top-level MMIO 
range. On GF119+, accessing an un-existing address with the ROOT access point can lead to a hard-lock. XXX: 
What's the point of this feature? 


Itis possible to get an interrupt when an error occurs by poking 1 to MMIO INTR EN. The interrupt will be fired on 
line 11. The error is described in MMIO ERR. 


ММОПО 0х7а0 / I[0x1e800]: MMIO ADDR Specifies the MMIO address that will be written to/read from by 
MMIO CTRL. 


On СТ215:0Е119, this register only contains the address to be accessed. 
On GF119, this register became a bitfield: bits 0-25: ADDR bit 27: ACCESS POINT 
0: ROOT 1: IBUS 


MMIO 0х7а4 / I[0x1e900]: MMIO VALUE The value that will be written to / is read from MMIO ADDR when 
an operation is triggered by MMIO CTRL. 


MMIO 0х7а8 / I[0x1e900]: MMIO TIMEOUT Specifies the timeout for MMIO access. XXX: Clock source? 
PDAEMON’s core clock, PTIMER's, Host's? 


MMIO 0x7ac / I[0x1eb00]: MMIO СТКІ, Process the MMIO request with given params (MMIO ADDR, 
MMIO VALUE). bits 0-1: request 


0: XXX 1: read 2: write 3: XXX 
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bits 4-7: BYTE MASK bit 12: BUSY [RO] bit 13: TIMEOUT [RO] bit 14: FAULT [RO] bit 16: TRIGGER 
MMIO 0x7b0 / I[0x1ec00] [MMIO ERR] 
Specifies the MMIO error status: 
e TIMEOUT: ROOT/IBUS has not answered PDAEMON's request 
* CMD WHILE BUSY: a request has been fired while being busy 
* WRITE: set if the request was a write, cleared if it was a read 
* FAULT: No engine answered ROOT/IBUS’s request 


Оп GT215:GF119, clearing MMIO INTR's bit 0 will also clear MMIO ERR. On GF119+, clearing 
MMIO ERR is done by poking Oxffffffff. 


GT215:GF100: bit 0: TIMEOUT bit 1: CMD WHILE BUSY bit 2: WRITE bits 3-31: ADDR 
GF100:GF119: bit 0: TIMEOUT bit 1: СМО. WHILE BUSY bit 2: WRITE bits 3-30: ADDR bit 31: FAULT 


GF119+: bit 0: TIMEOUT ROOT bit 1: TIMEOUT IBUS bit 2: СМр WHILE BUSY bit 3: WRITE bits 
4-29: ADDR bit 30: FAULT_ROOT bit 31: FAULT IBUS 


MMIO 0x7b4 / I[0x1ed00] [MMIO INTR] Specifies which MMIO interrupts are active. Clear the associated bit to 
ACK. bit 0: ERR 


Clearing this bit will also clear MMIO ERR on GT215:GF119. 


MMIO 0x7b8 / I[0x1ee00] [MMIO INTR EN] Specifies which MMIO interrupts are enabled. Interrupts will be 
fired on SUBINTR 74. bit 0: ERR 


Engine power gating 


Todo: write me 


Input/output signals 


Contents 


* Input/output signals 


— Introduction 


- Interrupts 


Todo: write me 


Introduction 


Todo: write me 
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Interrupts 


Todo: write me 


Introduction 


PDAEMON is a falcon-based engine introduced on GT215. Its main purpose is autonomous power and thermal man- 
agement, but it can be used to oversee any part of GPU operation. The PDAEMON has many dedicated connections 
to various parts of the GPU. 


The PDAEMON is made of: 
* a falcon microprocessor core 
* standard falcon memory interface unit 
• a simple channel load interface, replacing the usual PFIFO interface 
* various means of communication betwen falcon and host 
* engine power gating controllers for the PFIFO-connected engines 
* "idle" signals from various engines and associated idle counters 
* misc simple input/output signals to various engines, with interrupt capability 
• aoneshot/periodic timer, using daemon clock or PTIMER as clock source 
* PMC interrupt redirection circuitry 
* indirect MMIO access circuitry 
* direct interface to all PTHERM registers 


* CRC computation hardware 


Todo: and unknown stuff. 


There are 5 revisions of PDAEMON: 
e v0: GT215:MCP89? - the original revision 
vl: MCP89:GF100 - added a third instance of power gating controller for PVCOMP engine 
v2: GF100:GF119 - removed PVCOMP support, added second set of input/output signals and ??? 
v3: GF119:GK104 - changed ІП space layout, added 222 


v4: GK104- - a new version of engine power gating controller and ??? 


Todo: figure out additions 


Todo: this file deals mostly with GT215 version now 
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2.5.3 NV43:G80 thermal monitoring 


Contents 


* NV43:G60 thermal monitoring 
- Introduction 
— MMIO register list 
- The АРС clock 
— Reading temperature 
— Setting up thresholds and interrupts 
* Alarm 


* Temperature range 


— Extended configuration 


Introduction 


THERM is an area present in PBUS on NV43:G80 GPUs. This area is reponsible for temperature monitoring, probably 
on low-end and middle-range GPUs since high-end cards have been using LM89/ADT7473 for a long time. Beside 
some configuration knobs, THERM can generate IRQs to the HOST when the temperature goes over a configurable 
ALARM threshold or outside a configurable temperature range. This range has been replaced by PTHERM on G80+ 
GPUs. 


THERM's MMIO range is Ox15b0:0x15c0. There are two major variants of this range: 
* NV43:G70 
* G70:G80 


MMIO register list 


Address | Present on | Name Description 

0x0015b0 | all CFGO sensor enable / IRQ enable / ALARM configuration 
0x0015b4 | all STATUS sensor state / ALARM state / ADC rate configuration 
0x0015b8 | non-IGP CFGI misc. configuration 

0x0015bc | ай TEMP RANGE | LOW and HIGH temperature thresholds 


ММПО 0x15b0: СЕСО [NV43:G70] 
* bits 0-7: ALARM HIGH 
* bits 16-23: SENSOR. OFFSET (signed integer) 
• bit 24: DISABLE 
• bit 28: ALARM INTR EN 
ММПО 0x15b0: СЕСО [G70:G80] 
* bits 0-13: ALARM HIGH 
* bits 16-29: SENSOR OFFSET (signed integer) 
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* bit 30: DISABLE 
* bit 31: ENABLE 
ММПО 0x15b4: STATUS [NV43:G70] 
* bits 0-7: SENSOR RAW 
e bit 8: ALARM HIGH 
• bits 25-31: Арс CLOCK XXX 


Todo: figure out what divisors are available 


ММПО 0x15b4: STATUS [G70:G80] 
• bits 0-13: SENSOR RAW 
* bit 16: ALARM HIGH 


* bits 26-31: ADC_CLOCK_DIV The division is stored right-shifted 4. The possible division values range 
from 32 to 2016 with the possibility to completely bypass the divider. 


ММПО 0х1558: CFG1 [NV43:G70] 
* bit 17: ADC PAUSE 
* bit 23: CONNECT SENSOR 
MMIO 0x15bc: TEMP RANGE [NV43:G70] 
* bits 0-7: LOW 
* bits 8-15: HIGH 
MMIO 0x15bc: TEMP RANGE [G70:G80] 
* bits 0-13: LOW 
* bits 16-29: HIGH 


The ADC clock 


The source clock for THERM's ADC is: 
* NV43:G70: the host clock 
* G70:G80: constant (most likely hclck) 
(most likely, since the rate doesn't change when I change the HOST clock) 
Before reaching the ADC, the clock source is divided by a fixed divider of 1024 and then by АРС CLOCK DIV. 
MMIO 0x15b4: STATUS [NV43:G70] 
e bits 25-31: Арс CLOCK DIV 


Todo: figure out what divisors are available 


ММПО 0х1514: STATUS [G70:G80] 


e bits 26-31: Арс СІОСК ПУ The division is stored right-shifted 4. The possible division values range 
from 32 to 2016 with the possibility to completely bypass the divider. 
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The final ADC clock is: 
АГС clock = source clock / ADC_CLOCK_DIV 


The accuracy of the reading greatly depends on the ADC clock. A clock too fast will produce a lot of noise. A clock 
too low may actually produce an offseted value. The ADC clock rate under 10 KHz is advised, based on limited testing 
on a G73. 


Todo: Make sure this clock range is safe on all cards 


Anyway, it seems like it is clocked at an acceptable frequency at boot time, so, no need to worry too much about it. 


Reading temperature 


Temperature is read from: 
MMIO 0x15b4: STATUS [NV43:G70] bits 0-7: SENSOR RAW 
MMIO 0x15b4: STATUS [G70:G80] bits 0-13: SENSOR RAW 
SENSOR, RAW is the result of the (signed) addition of the actual value read by the ADC and SENSOR, OFFSET: 
ММПО 0x15b0: СЕСО [NV43:G70] 
* bits 16-23: SENSOR OFFSET signed 
ММПО 0x15b0: СЕСО [G70:G80] 
* bits 16-29: SENSOR OFFSET signed 


Starting temperature readouts requires to flip a few switches that are GPU-dependent: 
ММПО 0x15b0: СЕСО [NV43:G70] 
• bit 24: DISABLE 
ММПО 0x15b0: СЕСО [G70:G80] 
* bit 30: DISABLE - mutually exclusive with ENABLE 
* bit 31: ENABLE - mutually exclusive with DISABLE 
ММПО 0х1558: CFG1 [NV43:G70] 
• bit 17: ADC PAUSE 
e bit 23: CONNECT SENSOR 
Both DISABLE and ADC PAUSE should be clear. ENABLE and CONNECT SENSOR should be set. 


Todo: There may be other switches. 


Setting up thresholds and interrupts 


Alarm 


THERM features the ability to set up an alarm that will trigger interrupt PBUS #16 when SENSOR RAW > 
ALARM, HIGH. NV43-47 GPUs require ALARM. INTR, EN to be set in order to get the IRQ. You may need to 
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set bits 0х40001 in 0х15а0 and 1 in Ox15a4. Their purpose has not been understood yet even though they may be 
releated to automatic downclocking. 


ММПО 0x15b0: СЕСО [NV43:G70] 
* bits 0-7: ALARM HIGH 
• bit 28: ALARM INTR EN 
ММПО 0x15b0: СЕСО [G70:G80] 
* bits 0-13: ALARM HIGH 
When SENSOR. RAW » ALARM HIGH, STATUS.ALARM HIGH is set. 
ММПО 0x15b4: STATUS [NV43:G70] 
* bit 8: ALARM HIGH 
ММПО 0x15b4: STATUS [G70:G80] 
e bit 16: ALARM HIGH 
STATUS.ALARM. HIGH is unset as soon as SENSOR, RAW < ALARM, HIGH, without any hysteresis cycle. 


Temperature range 


THERM can check that temperature is inside a range. When the temperature goes outside this range, an interrupt is 
sent. The range is defined in the register TEMP RANGE where the thresholds LOW and HIGH are set. 


MMIO 0x15bc: TEMP RANGE [NV43:G70] 
* bits 0-7: LOW 
* bits 8-15: HIGH 

MMIO 0x15bc: TEMP RANGE [G70:G80] 
* bits 0-13: LOW 
* bits 16-29: HIGH 


When SENSOR RAW « TEMP RANGE.LOW, interrupt PBUS #17 is sent. When SENSOR RAW > 
TEMP RANGE.HIGH, interrupt PBUS #18 is sent. 


There are no hyteresis cycles on these thresholds. 


Extended configuration 


Todo: Document reg 15b8 


2.6 GPU external device 1/О units 


Contents: 
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2.6.1 G80:GF119 GPIO lines 


Contents 


* G60:GF119 GPIO lines 


— Introduction 


- Interrupts 

— С80 GPIO NVIO specials 

— G64 GPIO NVIO specials 

— G94 GPIO NVIO specials 

— GT215 GPIO NVIO specials 


Todo: write me 


Introduction 


Todo: write me 


Interrupts 


Todo: write me 


G80 GPIO NVIO specials 


This list applies to G80. 
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Line | Output Input 
0 PWM 0 
1 а 
2 2 
3 tag 0x42? 
4 SLI SENSE 0? 
5 Е 
б - 
7 - РТНЕКМ ЇМРОТ 0 
8 - РТНЕКМ INPUT 2 
9 related to elbc and РТНЕКМ? 
10 - 
11 SLI SENSE 1? 
12 tag 0x43? 
13 tag 0х0Ғ? 
14 - 
G84 GPIO NVIO specials 
This list applies to G84:G94. 
Line | Output Input 
4 PWM 0 
8 ТНЕКМ  SHUTDOWN? | PTHERM INPUT 0 
9 PWM 1 PTHERM INPUT 1 
11 SLI SENSE 0? 
12 PTHERM INPUT 2 
13 tag OxOf? 
14 SLI SENSE 1? 


G94 GPIO NVIO specials 


This list applies to G94:GT215. 


Line | Output Input 

1 АОХСН HPD 0 

4 РУУМ 0 

8 ТНЕКМ SHUTDOWN? | РТНЕКМ INPUT 0 
9 PWM 1 РТНЕКМ INPUT 1 
12 РТНЕКМ INPUT 2 
15 АОХСН HPD 2 

20 АОХСН HPD 1 

21 АОХСН HPD 3 


GT215 GPIO NVIO specials 


This list applies to GT215:GF119. 
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Line | Output Input 

1 АОХСН HPD 0 

3 SLI SENSE? 

8 ТНЕКМ SHUTDOWN? | РТНЕКМ INPUT 0 
9 PWM 1 PTHERM INPUT 1 
11 SLI SENSE? 

12 PTHERM INPUT 2 
15 AUXCH, HPD 2 

16 РҰМ 0 

19 АОХСН НРО 1 

21 АОХСН HPD 3 

22 tag 0x42? 

23 tag OxOf? 

[any] FAN. TACH 


2.7 Memory access and structure 


Contents: 


2.7.1 Memory structure 


Contents 


* Memory structure 
- Introduction 
— Memory planes and banks 
— Memory banks, ranks, and subpartitions 


— Memory partitions and subpartitions 


- Memory addressing 


Introduction 
While DRAM is often treated as a flat array of bytes, its internal structure is far more complicated. A good under- 
standing of it is necessary for high-performance applications like GPUs. 
Looking roughly from the bottom up, VRAM is made of: 
1. Memory planes of К rows by C columns, with each cell being one bit 


2. Memory banks made of 32, 64, or 128 memory planes used in parallel - the planes are usually spread across 
several chips, with one chip containing 16 or 32 memory planes 


3. Memory ranks made of several [2, 4 or 8] memory banks wired together and selected by address bits - all banks 
for a given memory plane reside in the same chip 


4. Memory subpartitions made of one or two memory ranks wired together and selected by chip select wires - 
ranks behave similarly to banks, but don't have to have uniform geometry, and are in separate chips 
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5. Memory partitions made of one or two somewhat independent subpartitions 


6. The whole VRAM, made of several [1-8] memory partitions 


Memory planes and banks 


The most basic unit of DRAM is a memory plane, which is a 2d array of bits organised in so-called columns and rows: 
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А memory plane contains a buffer, which holds а whole row. Internally, DRAM is read/written in row units via the 
buffer. This has several consequences: 


* before a bit can be operated on, its row must be loaded into the buffer, which is slow 
* after a row is done with, it needs to be written back to the memory array, which is also slow 
* accessing a new row is thus slow, and even slower when there already is an active row 


• it's often useful to preemptively close a row after some inactivity time - such operation is called “ргесһагоіпо” 
a bank 


* different columns in the same row, however, can be accessed quickly 


Since loading column address itself takes more time than actually accessing a bit in the active buffer, DRAM 15 
accessed in bursts - a series of accesses to 1-8 neighbouring bits in the active row. Usually all bits in a burst have to be 
located in a single aligned 8-bit group. 


The amount of rows and columns in memory plane is always a power of two, and is measured by the count of row 
selection and column selection bits Пе. log2 of the row/column count]. There are typically 8-10 column bits and 10-14 
row bits. 


The memory planes are organised in banks - groups of some power of two number of memory planes. The memory 
planes are wired in parallel, sharing the address and control wires, with only the data / data enable wires separate. 
This effectively makes a memory bank like a memory plane that’s composed of 32/64/128-bit memory cells instead of 
single bits - all the rules that apply to a plane still apply to a bank, except larger units than a bit are operated on. 


A single memory chip usually contains 16 or 32 memory planes for a single bank, thus several chips are often wired 
together to make wider banks. 


Memory banks, ranks, and subpartitions 


A memory chip contains several [2, 4, or 8] banks, using the same data wires and multiplexed via bank select wires. 
While switching between banks is slightly slower than switching between columns in a row, it’s much faster than 
switching between rows in the same bank. 


A memory rank is thus made of (MEMORY CELL SIZE / MEMORY CELL SIZE PER CHIP) memory chips. 
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One or two memory ranks connected via common wires [including data] except a chip select wire make up a memory 
subpartition. Switching between ranks has basically the same performance consequences as switching between banks 
in a rank - the only differences are the physical implementation and the possibility of using different amount of row 
selection bits for each rank [though bank count and column count have to match]. 


The consequences of existence of several banks/ranks: 


* it’s important to ensure that data accessed together belongs to either the same row, or to different banks [to avoid 
row switching] 


* tiled memory layouts are designed so that a tile corresponds roughly to a row, and neighbouring tiles never share 
a bank 


Memory partitions and subpartitions 


A memory subpartition has its own DRAM controller on the GPU. 1 or 2 subpartitions make a memory partition, 
which is a fairly independent entity with its own memory access queue, own ZROP and CROP units, and own L2 
cache on later cards. All memory partitions taken together with the crossbar logic make up the entire VRAM logic for 
a GPU. 


АП subpartitions in a partition have to be configured identically. Partitions in a GPU are usually configured identically, 
but don't have to on newer cards. 


The consequences of subpartition/partition existence: 
* like banks, different partitions may be utilised to avoid row conflicts for related data 


* unlike banks, bandwidth suffers if (sub)partitions are not utilised equally - load balancing is thus very important 


Memory addressing 


While memory addressing is highly dependent on GPU family, the basic approach is outlined here. 
The bits of a memory address are, in sequence, assigned to: 


* identifying a byte inside a memory cell - since whole cells always have to be accessed anyway 


several column selection bits, to allow for a burst 


partition/subpartition selection - in low bits to ensure good load balancing, but not too low to keep relatively 
large tiles in a single partition for ROP's benefit 


remaining column selection bits 


all/most of bank selection bits, sometimes a rank selection bit - so that immediately neighbouring addresses 
never cause a row conflict 


row bits 


remaining bank bit or rank bit - effectively allows splitting VRAM into two areas, placing color buffer in one 
and zeta buffer in the other, so that there are never row conflicts between them 


2.7.2 NV1:G80 surface formats 


Contents 


e NVI:G80 surface formats 
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— Introduction | 


Todo: write me 


Introduction 


Todo: write me 


2.7.3 NV3 VRAM structure and usage 


Contents 


* NV3 VRAM structure and usage 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.7.4 NV3 DMA objects 


Contents 


* NV3 DMA objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 
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2.7.5 NV4:G80 DMA objects 


Contents 


* NV4:G80 DMA objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.7.6 NV44 host memory interface 


Contents 


* NV44 host memory interface 


— Introduction 


— MMIO registers 


Todo: write me 


Introduction 


Todo: write me 


MMIO registers 


Todo: write me 


2.7.7 G80 surface formats 
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Contents 


* G60 surface formats 
- Introduction 
- Surface elements 
— Pitch surfaces 
- Blocklinear surfaces 
— Textures, mipmapping and arrays 
— Multisampled surfaces 


- Surface formats 


* 


Simple color surface formats 
* Shared exponent color format 


* YUV color formats 


* 


Zeta surface format 
* Compressed texture formats 
* Bitmap surface format 
- G80 storage types 
* Blocklinear color storage types 


* Zeta storage types 


- GF100 storage types 


Introduction 


This file deals with G80+ cards only. For older cards, see NV/:G80 surface formats. 


A “surface” is a 2d or 3d array of elements. Surfaces are used for image storage, and can be bound to at least the 
following slots on the engines: 


* m2mf input and output buffers 
* 2d source and destination surfaces 
* 3d/compute texture units: the textures 


* 34 color render targets 


3d zeta render target 


compute g[] spaces [G80:GF100] 


3d/compute image units [GF100+] 


PCOPY input and output buffers 
PDISPLAY: the framebuffer 


Todo: vdec stuff 
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Todo: GF100 ZCULL? 


Surfaces on G80+ cards come in two types: pitch and blocklinear. Pitch surfaces have a simple format, but they're 
are limited to 2 dimensions only, don't support arrays nor mipmapping when used as textures, cannot be used for zeta 
buffers, and have lower performance than blocklinear textures. Blocklinear surfaces can have up to three dimensions, 
can be put into arrays and be mipmapped, and use custom element arrangement in memory. However, blocklinear 
surfaces need to be placed in memory area with special storage type, depending on the surface format. 


Blocklinear surfaces have two main levels of element rearrangement: high-level and low-level. Low-level rearrange- 
ment is quite complicated, depends on surface's storage type, and is hidden by the VM subsystem - if the surface is 
accessed through VM with properly set storage type, only the high-level rearrangement is visible. Thus the low-level 
rearrangement can only be seen when accessing blocklinear system RAM directly from CPU, or accessing blocklinear 
VRAM with storage type set to O. Also, low-level rearrangement for VRAM uses several tricks to distribute load 
evenly across memory partitions, while rearrangement for system RAM skips them and merely reorders elements 
inside a gob. High-level rearrangement, otoh, is relatively simple, and always visible to the user - its knowledge is 
needed to calculate address of a given element, or to calculate the memory size of a surface. 


Surface elements 


А basic unit of surface is an "element", which can be 1, 2, 4, 8, or 16 bytes long. element type is vital in selecting the 
proper compressed storage type for a surface. For most surface formats, an element means simply a sample. This is 
different for surfaces storing compressed textures - the elements are compressed blocks. Also, it's different for bitmap 
textures - in these, an element is a 64-bit word containing 8x8 block of samples. 


While texture, RT, and 2d bindings deal only with surface elements, they're ignored by some other binding points, like 
PCOPY and m2mf - in these, the element size is ignored, and the surface is treated as an array of bytes. That is, a 
16x16 surface of 4-byte elements is treated as a 64x16 surface of bytes. 


Pitch surfaces 


A pitch surface is a 2d array of elements, where each row is contiguous in memory, and each row starts at a fixed 


% 66 


distance from start of the previous one. This distance is the surface’s “pitch”. Pitch surfaces always use storage type 0 
[pitch]. 


The attributes defining a pitch surface are: 
» address: 40-bit VM address, aligned to 64 bytes 
* pitch: distance between subsequent rows in bytes - needs to be a multiple of 64 
* element size: implied by format, or defaulting to 1 if binding point is byte-oriented 
* width: surface width in elements, only used when bounds checking / size information is needed 


* height: surface height in elements, only used when bounds checking / size information is needed 


Todo: check pitch, width, height min/max values. this may depend on binding point. check if 64 byte alignment still 
holds on GF100. 


The address of element (х,у) is: 


address + pitch х y + elem size х x 


Or, alternatively, the address of byte (x,y) is: 
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address + pitch х y + x 


Blocklinear surfaces 


A blocklinear surface is a 3d array of elements, stored in memory in units called “gobs” and “blocks”. There are two 
levels of tiling. The lower-level unit is called а “gob” and has a fixed size. This size is 64 bytes x 4 x 1 on G80:GF100 
cards, 64 bytes x 8 x 1 for GF100+ cards. The higher-level unit is called a “block”, and is of variable size between 
1х1х1 and 32x32x32 gobs. 


The attributes defining a blocklinear surface are: 


e address: 40-bit VM address, aligned to gob size [0x100 bytes on G80:GF100, 0x200 bytes on GF100] 


block width: 0-5, log2 of gobs per block in x dimension 


block height: 0-5, log2 of gobs per block in y dimension 


block depth: 0-5, log2 of gobs per block in z dimension 


element size: implied by format, or defaulting to 1 if the binding point is byte-oriented 


width: surface width [size in x dimension] in elements 


height: surface height [size in y dimension] in elements 


depth: surface depth [size in z dimension] in elements 


Todo: check bounduaries on them all, check tiling on GF100. 


Todo: PCOPY surfaces with weird gob size 


It should be noted that some limits on these parameters are to some extent specific to the binding point. In particular, 
block width greater than 0 is only supported by the render targets and texture units, with render targets only supporting 
0 and 1. block height of 0-5 can be safely used with all blocklinear surface binding points, and block depth of 0-5 can 
be used with binding points other than G80 g[] spaces, which only support 0. 


The blocklinear format works as follows: 


First, the block size is computed. This computation depends on the binding point: some binding points clamp the 
effective block size in a given dimension to the smallest size that would cover the whole surfaces, some do not. The 
ones that do are called “auto-sizing” binding points. One of such binding ports where it’s important is the texture unit: 
since all mipmap levels of a texture use a single “block size” field in TIC, the auto-sizing is needed to ensure that small 
mipmaps of a large surface don’t use needlessly large blocks. Pseudocode: 


bytes_per_gob_x = 64; 
if (gpu < GF100) 
bytes_per_gob_y = 4; 
else 
bytes_per_gob_y = 8; 
bytes_per_gob_z = 1; 
eff_block_width = block_width; 
eff block height = block height; 
eff block depth - block depth; 
if (auto sizing) { 
while (eff block width > 0 && (bytes per дор x << (eff block width - 1)) >= width, 
ж element, size) 


(continues on next page) 
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(continued from previous page) 


eff block width--; 
while (eff block height > 0 && (bytes per gob y << (eff block height - 1)) >= 
—height) 
eff block height--; 
while (eff block depth > 0 && (bytes per gob z << (eff block depth - 1)) >= depth) 
eff block depth--; 


ы 


} 

gobs, per block x = 1 << eff block width; 

gobs per block y = 1 << eff block height; 

gobs, per block 2 = 1 << eff block depth; 

bytes per block x = bytes per gob x * gobs per block x; 

bytes per block y bytes per дор y х дорѕ per block y; 
bytes per block z = bytes per сор 2 х gobs per block z; 
elements per block x = bytes per block x / element size; 

gob bytes = bytes per gob x х bytes per дор y х bytes per gob z; 
block gobs = gobs per bigtils x х gobs per block y ж gobs per block 2; 
block bytes = дор bytes х block gobs; 


Due to the auto-sizing being present on some binding points, it's a bad idea to use surfaces that have block size at 
least two times bigger than the actual surface - they'll be unusable on these binding points [and waste a lot of memory 
anyway]. 


Once block size is known, the geometry and size of the surface can be determined. A surface is first broken down into 
blocks. Each block convers a contiguous elements per block x x bytes per block y x bytes per block z aligned 
subarea of the surface. If the surface size is not a multiple of the block size in any dimension, the size is aligned up for 
surface layout purposes and the remaining space is unused. The blocks making up a surface are stored sequentially in 
memory first in x direction, then in y direction, then in z direction: 


blocks per surface x = ceil(width * element size / bytes, per block x); 
blocks per surface y = ceil(height / bytes. per block y); 

blocks per surface z - ceil(depth / bytes per block z); 

surface blocks = blocks per surface х х blocks per surface y х blocks per surface z; 


// total bytes in surface - surface resides at addresses [address, address-*surface | 
bytes) 
Surface bytes = surface blocks х block bytes; 


block address = address + floor(x coord ж element size / bytes per block x) х block. 
bytes 


+ floor(y coord / bytes per block y) х block bytes х blocks, per surface x; 
+ floor(z coord / bytes per block 7) х block bytes х blocks per surface x, 
—* blocks, per surface y; 
X coord in block = (x coord ж element size) 5 bytes per block x; 
bytes per block y; 
bytes per block z; 


y coord in block = y coord 
z coord in block = z coord 


a 
6 
2 
6 


Like blocks in the surface, gobs inside a block are stored ordered first by x coord, then by y coord, then by z coord: 


gob_address = block_address 

+ floor(x_coord_in_block / bytes_per_gob_x) х gob_bytes 

+ floor(y coord in block / bytes per gob y) х gob bytes х gobs per block x 
+ z coord in block х дор bytes х gobs per block x * gobs per block y; //,, 


—bytes per дор z always 1. 
X coord in gob = x coord in block 
y. coord in gob = y coord in block 


bytes per gob x; 
bytes. per дор y; 


2 
б 
9- 
$ 


The elements inside a gob are likewise stored ordered first by x coordinate, and then by y: 
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element address = дор address + x coord in дор + y coord in сор х bytes per дор x; 


Note that the above is the higher-level rearrangement only - the element address resulting from the above pseudocode 
is the address that user would see by looking through the card's VM subsystem. The lower-level rearrangement is 
storage type dependent, invisible to the user, and will be covered below. 


As an example, let's take a 13 x 17 x 3 surface with element size of 16 bytes, block width of 1, block height of 1, and 
block depth of 1. Further, the card is assumed to be G80. The surface will be located in memory the following way: 


* block size in bytes = 0x800 bytes 

* block width: 128 bytes / 8 elements 
* block height: 8 

* block depth: 2 

* surface width in blocks: 2 

* surface height in blocks: 3 

* surface depth in blocks: 2 


* surface memory size: 0x6000 bytes 


| - Xx element bounduary 

|| = x gob bounduary 

||| = x block bounduary 

[no line] - y element bounduary 
--- — y gob bounduary 

=== — y block bounduary 
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х --> 
n _ _ _ m _ Ег а _ 
| 0 1 2 3 4 5 6 7 8 9 | 10 | 11 12 
| +--+----+----+- a —— — — ———_— ae UC NP ршн ------ 
V| 01|04001|041010420|0430110500105101|05201|05301110с0010с1010с201|0с301 10400 

1|0440|0450|0460|0470| |0540 |0550 |05601|05701| | 10с401|0с5010с60|0с701| |0440 
21048010490 |04а0|04501| 1058010590 | 05а0 |0500 | | 10с80 |0с90 | Оса0 | 0сро | |0480 
3 104с0 10440 |04е0 | 04£0| |05с0 |0540 | 05е0 |05#0 | | | 0ссо | 0сао | Осео | 0с#0 | |0dcO 
Ll ned sea ЕЕ Е = ыы ыз ар т _ 
4 |06001|061010620 |06301 |0700107101|07201|07301 | 10е00 | 0а10 | 0е20 |0а30 | 10ғ00 
5 1064010650 10660 |0670 |10740 1075010760 10770 | | 10е40 | 0а50 | 0еб0 |0а70 | |0£40 
61068010690 |06а0 [06001 1078010790 |07а0 |0700 || 1 0е80 | 0а90 | 0ea0| 0аро | |0Ғ80 
7 |06с0 |0640 | 06е0 | 06Ғ0 | |07с0 |0740 | 07е0 |07#0 | | |0ес0|0а40|0ее0|0аҒ01| | OfcO 
81140011410 11420 |1430 | 11500 |1510 1152011530 | | 11с00 |1с10 |1с20|1с30 |11900 
91144011450 |11460 [14701 11540 1155011560 |1570 |1 11с40|1с50 |1с60 |1с70| |1940 
101148011490 |14а0 |1400 | |1580 11590 |15а0 |1500 || 11с80|1с90|1са0|1с501 11980 
11114с0 11490 |14е0 |14Ғ#0| |15с0 11540 |15е0 |15#0 | | |1сс0 | 1са0 |1се0 | 1с#0 | | 1dcO 
121160011610 |1620 116301 1170011710 |1720 11730 || |1е00 |1е10 |1е20 | 1е30 | | 1£00 
1311640116501166011670| 11740 |1750 11760 |1770 | | 11е40|1е50 |1е60|1е70| | 1£40 
141168011690 |16а0 |1600 | 1178011790 |17а0 |1700 |1 11е80 |1е90 |1еа0 | 1еро | |1Ғ80 
15 |16с0 11640 |16е0|16Ғ#0 | 117с0 11740 |17е0 |17Ғ#0| | | 1ecO| 1edO 0|1еҒ0||1Ғс0 
161240012410 |2420 |24301 |2500 1251012520 12530 |1 12с00|2с1012с20|2с301 |2400 
MI BIS TNCS 22 2 22000052 m _ 
1? block bounduary here] 
2 == 2: 

Ж --> 
yr- 5 Е Е Е Е ЕГНЕ = z 
| 0 1 2 d 4 5 6 7 8 9 | 10 | 11 12 
| m — = E = ie Seale = жыз — — ы ab іе 
V| 0130001|301013020|30301131001311013120131301 | 13800 |3810 |3820 |3830 |13900 

11304013050 13060 |3070 | 13140 |3150 |3160 |3170 | | 13840 |3850 1386013870 | 13940 
213080 |3090 | 30а0 |3000 | 1318013190 |31а0 |3100 | | 1388013890 | 38а0 |3800 | 13980 
3 130с0 13040 | 30е0 | 30£0| |31с0 |3140 | З1е0 | 31Ғ0 | | 138с0 | 3840 | 38е0|38Ғ01| |39c0 
41320013210 13220 |32301 13300 |3310 |3320 |3330 | | 13а00 | За10 | За20 | За30 | | 3500 
5 |3240 1325013260 |3270 | 13340 |3350 1336013370 | | 13а40 | 3а50 | Заб0 |3а70 | | 3540 
61328013290 |32а0 |32001 1338013390 |33а0 |3300 | | 1 3а80 | 3а90 | Заа0 | Заро | |30580 
7 132с0 13240 |32е0 | 32£0| |33с0 | 3340 | ЗЗе0 | 33Ғ0 | | | Засо | Заа0 | Зае0|ЗаҒ01| | 3bc0 
81400014010 14020 |4030 | 14100 |4110 |4120 |4130 | 114800 |4810 |4820 |4830 | 14900 
9 1404014050 |4060 |40701 14140 |4150 |4160 |4170 || 14840 |4850 |4860 |4870 | |4940 
101408014090 | 40а0 |4060 | 1418014190 |41а0 |4100 || 1488014890 |48а0 |48501 |4980 
11 |40с0 14090 | 40е0 | 40#0| | 41с0 14140 |41е0 | 41Ғ0 | | |48с0 14840 | 48е0 | 48Ғ0 | | 49c0 
12 [420014210 |4220 |4230 | |4300 14310 |4320 |43301 | |4а00 |4а10 | 4а20 | 4а30| |4600 
131424014250 |4260 |4270 | |4340 |14350 |4360 |4370 || |4а40 | 4а50 | 4а60 | 4а70| |4040 
141428014290 |42а0 |4200 | |4380 14390 |43а0 |43001 | |4а80 | 4а90 | 4аа0 | 4аро | |4080 
15 |42с0 14240 | 42е0 |42Ғ0| | 43с0 14340 | 43е0 | 43#0 | | | 4асо | 4а40 | 4ае0 | 4а#0 | | 4bcO 
+ 
161500015010 |5020 |20301 |5100 |5110 |5120 |5130 | | 15800 |15810 |5820 |5830 |15900 
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Textures, mipmapping and arrays 


A texture on G80/GF100 can have one of 9 types: 
* 1D: made of 1 or more mip levels, each mip level is a blocklinear surface with height and depth forced to 1 
* 2D: made of 1 or more mip levels, each mip level is a blocklinear surface with depth forced to 1 
* 3D: made of 1 or more mip levels, each mip level is a blocklinear surface 
e 1р ARRAY: made of some number of subtextures, each subtexture is like a single 1D texture 
* 2D ARRAY: made of some number of subtextures, each subtexture is like a single 2D texture 


e CUBE: made of 6 subtextures, each subtexture is like a single 2D texture - has the same layout as a 2D ARRAY 
with 6 subtextures, but different semantics 


* BUFFER: a simple packed 1D array of elements - not a surface 
* RECT: a single pitch surface, or a single blocklinear surface with depth forced to 1 


e CUBE ARRAY [GT215+4]: like 2D ARRAY, but subtexture count has to be divisible by 6, and groups of 6 
subtextures behave like CUBE textures 


Types other than BUFFER and RECT are made of subtextures, which are in turn made of mip levels, which are 
blocklinear surfaces. For such textures, only the parameters of the first mip level of the first subtexture are specified - 
parameters of the following mip levels and subtextures are calculated automatically. 


Each mip level has each dimension 2 times smaller than the corresponding dimension of previous mip level, rounding 
down unless it would result in size of 0. Since texture units use auto-sizing for the block size, the block sizes will be 
different between mip levels. The surface for each mip level starts right after the previous one ends. Also, the total 
size of the subtexture is rounded up to the size of the Oth mip level's block size: 


mip address[0] = subtexture address; 
mip width[0] = texture width; 
mip height[0] = texture height; 


mip depth[0] = texture depth; 
mip bytes[0] = calc surface bytes (mip[0]); 
subtexture bytes - mip bytes[0]; 


for (i = 1; i <= пах пір level; i++) { 
тір address[i] = пір address[i-1] + пір bytes[i-1]; 
mip width[i] = max(1, floor(mip width[i-1] / 2)); 
mip height[i] = max(1, floor(mip height[i-1] / 2)); 
mip depth[i] = max(1, floor(mip depth[i-1] / 2)); 
mip bytes[i] = calc surface bytes (mip[1]); 


subtexture bytes += пір bytes[il; 
} 
subtexture bytes = alignup(subtexture bytes, calc surface block bytes (mip[0])); 


For ID ARRAY, 2D ARRAY, CUBE and CUBE ARRAY textures, the subtextures are stored sequentially: 


for (i = 0; 1 < subtexture count; i++) { 
subtexture address[i] = texture address + i х subtexture bytes; 


) 


For more information about textures, see graph/g80-texture.txt 


Multisampled surfaces 


Some surfaces are used as multisampled surfaces. This includes surfaces bound as color and zeta render targets when 
multisampling type is other than 1X, as well as multisampled textures on GF100+. 
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A multisampled surface contains several samples per pixel. A "sample" is a single set of RGBA or depth/stencil 
values [depending on surface type]. These samples correspond to various points inside the pixel, called sample posi- 
tions. When a multisample surface has to be displayed, it is downsampled to a normal surface by an operation called 
"resolving". 


G80-- GPUs also support a variant of multisampling called “coverage sampling" ог CSAA. When CSAA is used, 
there are two sample types: full samples and coverage samples. Full samples behave as in normal multisampling. 
Coverage samples have assigned positions inside a pixel, but their values are not stored in the render target surfaces 
when rendering. Instead, a special component, called C or coverage, is added to the zeta surface, and for each coverage 
sample, a bitmask of full samples with the same value is stored. During the resolve process, this bitmask is used to 
assign different weights to the full samples depending on the count of coverage samples with matching values, thus 
improving picture quality. Note that the C component conceptually belongs to a whole pixel, not to individual samples. 
However, for surface layout purposes, its value is split into several parts, and each of the parts is stored together with 
one of the samples. 


For the most part, multisampling mode does not affect surface layout - in fact, a multisampled render target is bound 
as a non-multisampled texture for the resolving process. However, multisampling mode is vital for CSAA zeta surface 
layout, and for render target storage type selection if compression is to be used - the compression schema used is 
directly tied to multisampling mode. 


The following multisample modes exist: 
* mode 0x0: MS1 [1x1] - no multisampling 
— sample 0: (0x0.8, 0x0.8) [0,0] 
* mode Ox1: MS2 [2x1] 
— sample 0: (0x0.4, 0x0.4) [0,0] 
- sample 1: (0х0.с, 0х0.с) [1,0] 
* mode 0x2: MS4 [2x2] 
sample 0: (0x0.6, 0x0.2) [0,0] 
- sample 1: (0х0.е, 0x0.6) [1,0] 
— sample 2: (0x0.2, 0х0.а) [0,1 
sample 3: (0х0.а, 0x0.e) (1,1 
* mode 0x3: М58 [4х2] 
— sample 0: (0х0.1, 0x0.7) [0,0] 
— sample 1: (0x0.5, 0x0.3) [1,0] 
- sample 2: (0х0.3, 0х0.4) [0,1] 
[1,1] 
] 


] 
] 
0 
0 


— sample 3: (0x0.7, 0x0.b) [1,1 

— sample 4: (0х0.9, 0x0.5) [2,0 

— sample 5: (0x0.f, 0х0.1) [3,0] 

— sample 6: (0х0.Ь, 0х0.0) [2.1] 

sample 7: (0х0.4, 0x0.9) [3,1] 

* mode 0х4: MS2 ALT [2x1] [GT215-] 
- sample 0: (0х0.с, 0x0.c) [1,0] 
— sample 1: (0x0.4, 0x0.4) [0,0] 

* mode 0x5: MS8 ALT [4x2] [GT215-] 
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sample 0: (0x0.9, 0x0.5) [2,0 
sample 1: (0x0.7, 0x0.b) [1,1 
sample 2: (0х0.4, 0x0.9) (3,1 
sample 3: (0x0.5, 0x0.3) [1,0 
sample 4: (0x0.3, 0х0.4) [0,1 
sample 5: (0x0.1, 0x0.7) [0,0 
sample 6: (0xO.b, 0х0.0) [2.1] 
sample 7: (0xO.f, 0х0.1) [3,0] 


] 
] 
] 
] 
] 
] 


e mode 0х6: ??? [GF100-] [XXX] 


C component is 16 bits per pixel, bitfields: 


mode 0x8: М54 CSA [2x2] 


sample 0: (0x0.6, 0x0.2) [0,0] 
sample 1: (0х0.е, 0x0.6) [1,0] 
sample 2: (0x0.2, 0х0.а) [0,1 


] 
sample 3: (0х0.а, 0x0.e) [1,1] 


coverage sample 4: (0x0.5, 0x0.7), belongs to 1, 3, 0,2 


coverage sample 5: (0x0.9, 0x0.4), belongs to 3, 2, 1, 0 


coverage sample 6: (0x0.7, 0х0.с), belongs to 0, 1, 2, 3 


coverage sample 7: (0х0.Ь, 0x0.9), belongs to 2, 0, 3, 1 


0-3: sample 4 associations: 0, 1, 2, 3 
4-7: sample 5 associations: 0, 1, 2, 3 
8-11: sample 6 associations: 0, 1, 2, 3 


12-15: sample 7 associations: 0, 1, 2, 3 


* mode 0x9: MS4_CS12 [2x2] 


sample 0: (0x0.6, 0x0.1) [0,0] 
sample 1: (0xO.f, 0x0.6) [1,0] 
sample 2: (0х0.1, 0х0.а) [0,1] 
sample 3: (0х0.а, OxO.f) [1,1] 


coverage sample 4: (0x0.4, 0х0.е), belongs to 2, 3 
coverage sample 5: (0х0.с, 0x0.3), belongs to 1, 0 
coverage sample 6: (0х0.4, 0х0.4), belongs to 3, 1 
coverage sample 7: (0x0.4, 0x0.4), belongs to 0, 2 
coverage sample 8: (0x0.9, 0x0.5), belongs to 0, 1, 2 


coverage sample 9: (0x0.7, 0x0.7), belongs to 0, 2, 1,3 


coverage sample a: (0x0.b, 0x0.8), belongs to 1, 3, 0 
coverage sample b: (0x0.3, 0x0.8), belongs to 2, 0, 3 
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— coverage sample c: (0x0.8, 0х0.с), belongs to 3, 2, 1 
— coverage sample d: (0x0.2, 0x0.2), belongs to 0, 2 
— coverage sample e: (0x0.5, 0х0.5), belongs to 2, 3, 0,1 
— coverage sample f: (0х0.е, 0x0.9), belongs to 1, 3 
C component is 32 bits per pixel, bitfields: 
— 0-1: sample 4 associations: 2, 3 
— 2-3: sample 5 associations: 0, 1 
— 4-5: sample 6 associations: 1,3 
- 6-7: sample 7 associations: 0, 2 
- 8-10: sample 8 associations: 0, 1, 2 
— 11-14: sample 9 associations: 0, 1, 2, 3 
— 15-17: sample a associations: 0, 1, 3 
— 18-20: sample b associations: 0, 2, 3 
— 31-23: sample c associations: 1, 2, 3 
- 24-25: sample d associations: 0, 2 
— 26-29: sample e associations: 0, 1, 2, 3 
- 30-31: sample f associations: 1, 3 
• mode Оха: М58 CSS8 [4x2] 
— sample 0: (0х0.1, 0x0.3) [0,0] 
sample 1: (0x0.6, 0x0.4) [1,0] 
— sample 2: (0х0.3, 0х0.0) [0,1] 
— sample 3: (0х0.4, 0x0.b) [1,1] 
— sample 4: (0х0.с, 0х0.1) (2, 
— sample 5: (0х0.е, 0х0.7) [3 
— sample 6: (0x0.8, 0x0.8) [2,1] 
- sample 7: (0xO.f, 0х0.4) [3,1] 


0] 
0] 


E) 


— coverage sample 8: (0x0.5, 0x0.7), belongs to 1, 6, 3, 0 
— coverage sample 9: (0x0.7, 0x0.2), belongs to 1, 0, 4, 6 
— coverage sample a: (0x0.b, 0x0.6), belongs to 5, 6, 1, 4 
— coverage sample b: (0х0.4, 0x0.3), belongs to 4, 5, 6, 1 
— coverage sample c: (0х0.2, 0х0.9), belongs to 3, 0, 2, 1 
— coverage sample d: (0x0.7, 0х0.с), belongs to 3, 2, 6, 7 
— coverage sample e: (0х0.а, 0х0.е), belongs to 7, 3, 2, 6 
— coverage sample f: (0x0.c, 0х0.а), belongs to 5, 6, 7, 3 
C component is 32 bits per pixel, bitfields: 


— 0-3: sample 8 associations: 0, 1, 3, 6 
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4-7: sample 8 associations: 0, 1, 4, 6 
— 8-11: sample 8 associations: 1, 4, 5, 6 


— 12-15: sample 8 associations: 1, 4, 5, 6 


16-19: sample 8 associations: 0, 1, 2, 3 


20-23: sample 8 associations: 2, 3, 6, 7 
- 24-27: sample 8 associations: 2, 3, 6, 7 
— 28-31: sample 8 associations: 3, 5, 6, 7 
* mode Oxb: MS8 С524 [GF100-] 


Todo: wtf 15 up with modes 4 and 5? 


Todo: nail down MS8 CS24 sample positions 


Todo: figure out mode 6 


Todo: figure out MS8 CS24 C component 


Note that MS8 and MS8 C* modes cannot be used with surfaces that have 16-byte element size due to a hardware 
limitation. Also, multisampling is only possible with blocklinear surfaces. 


Todo: check MS8/128bpp on СЕ100. 


The sample ids are, for full samples, the values appearing in the sampleid register. The numbers in () are the geometric 
coordinates of the sample inside a pixel, as used by the rasterization process. The dimensions in [] are dimensions of a 
block represents a pixel in the surface - if it’s 4x2, each pixel is represented in the surface as a block 4 elements wide 
and 2 elements tall. The numbers in [] after each full sample are the coordinates inside this block. 


Each coverage sample “belongs to" several full samples. For every such pair of coverage sample and full sample, 
the C component contains a bit that tells if the coverage sample's value is the same as the full one's, ie. if the last 
rendered primitive that covered the full sample also covered the coverage sample. When the surface is resolved, each 
sample will *contribute" to exactly one full sample. The full samples always contribute to themselves, while coverage 
sample will contribute to the first full sample that they belong to, in order listed above, that has the relevant bit set in 
C component of the zeta surface. If none of the C bits for a given coverage sample are set, the sample will default to 
contributing to the first sample in its belongs list. Then, for each full sample, the number of samples contributing to it 
is counted, and used as its weight when performing the downsample calculation. 


Note that, while the belongs list orderings are carefully chosen based on sample locations and to even the weights, the 
bits in C component don't use this ordering and are sorted by sample id instead. 


The C component is 16 or 32 bits per pixel, depending on the format. It is then split into 8-bit chunks, starting from 
LSB, and each chunk is assigned to one of the full samples. For М54 CS4 and MS8 CSS, only samples in the top 
line of each block get a chunk assigned, for MS4_CS12 all samples get a chunk. The chunks are assigned to samples 
ordered first by x coordinate of the sample, then by its y coordinate. 


2.7. Memory access and structure 117 


nVidia Hardware Documentation, Release git 


Surface formats 


A surface's format determines the type of information it stores in its elements, the element size, and the element 
layout. Not all binding points care about the format - m2mf and PCOPY treat all surfaces as arrays of bytes. Also, 
format specification differs a lot between the binding points that make use of it - 2d engine and render targets use a 
big enum of valid formats, with values specifying both the layout and components, while texture units decouple layout 
specification from component assignment and type selection, allowing arbitrary swizzles. 


There are 3 main enums used for specifying surface formats: 
* texture format: used for textures, epecifies element size and layout, but not the component assignments nor type 
* color format: used for color RTs and the 2d engine, specifies the full format 


* zeta format: used for zeta RTs, specifies the full format, except the specific coverage sampling mode, if appli- 
cable 


The surface formats can be broadly divided into the following categories: 


* simple color formats: elements correspond directly to samples. Each element has 1 to 4 bitfields corresponding 
to R, G, B, A components. Usable for texturing, color RTs, and 2d engine. 


shared exponent color format: like above, but the components are floats sharing the exponent bitfield. Usable 
for texturing only. 


YUV color formats: element corresponds to two pixels lying in the same horizontal line. The pixels have three 
components, conventionally labeled as Y, U, V. U and V components are common for the two pixels making up 
an element, but Y components are separate. Usable for texturing only. 


zeta formats: elements correspond to samples. There is a per-sample depth component, optionally a per-sample 
stencil component, and optionally a per-pixel coverage value for CSAA surfaces. Usable for texturing and ZETA 
RT. 


compressed texture formats: elements correspond to blocks of samples, and are decoded to RGBA color values 
on the fly. Can be used only for texturing. 


bitmap texture format: each element corresponds to 8x8 block of samples, with 1 bit per sample. Has to be used 
with a special texture sampler. Usable for texturing and 2d engine. 


Todo: wtf is color format Ox 1d? 


Simple color surface formats 


A simple color surface is a surface where each element corresponds directly to a sample, each sample has 4 components 
known as R, G, B, A [in that order], and the bitfields in element correspond directly to components. There can be less 
bitfields than components - the remaining components will be ignored on write, and get a default value on read, which 
18 0 for К, б, B and 1 for A. 


When bound to texture unit, the simple color formats are specified in three parts. First, the format is specified, which 
is an enumerated value shared with other format types. This format specifies the format type and, for simple color 
formats, element size, and location of bitfields inside the element. Then, the type [float/sint/uint/unorm/snorm] of each 
element component is specified. Finally, a swizzle is specified: each of the 4 component outputs [R, G, B, A] from 
the texture unit can be mapped to any of the components present in the element [called C0-C3], constant 0, integer 
constant 1, or float constant 1. 


Thanks to the swizzle capability, there's no need to support multiple orderings in the format itself, and all simple color 
texture formats have СО bitfield starting at LSB of the first byte, СІ [if present] at the first bit after СО, and so on. 
Thus it's enough to specify bitfield lengths to uniquely identify a texture type: for example 5 5 6 is a format with 3 
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components and element size of 2 bytes, СО at bits 0-4, СІ at bits 5-9, and C2 at bits 10-15. The element is always 
treated as a little-endian word of the proper size, and bitfields are listed from LSB side. Also, in some cases the texture 
format has bitfields used only for padding, and not usable as components: these will be listed in the name as X<size>. 
For example, 32 8 X24 is a format with element size of 8 bytes, where bits 0-31 are СО, 32-39 are C1, and 40-63 are 
unusable. [XXX: what exactly happens to element layout in big-endian mode?] 


However, when bound to RTs or the 2d engine, all of the format, including element size, element layout, component 
types, component assignment, and SRGB flag, is specified by a single enumerated value. These formats have a many- 
to-one relationship to texture formats, and are listed here below the corresponding one. The information listed here 
for a format is CO-C3 assignments to actual components and component type, plus SRGB flag where applicable. The 
components can be R, G, B, A, representing a bitfield corresponding directly to a single component, X representing 
an unused bitfield, or Y representing a bitfield copied to all components on read, and filled with the R value on write. 


The formats are: 
Element size 16: 
* texture format 0x01: 32, 32 32 32 
— color format 0хс0: RGBA, float 
— color format Oxc1: RGBA, sint 
— color format Oxc2: RGBA, uint 
— color format Охс3: RGBX, float 
— color format Охс4: RGBX, sint 
— color format Охс5: RGBX, uint 
Element size 8: 
* texture format 0x03: 16 16 16 16 
— color format 0xc6: КОВА, unorm 
— color format 0xc7: RGBA, snorm 
— color format Охс8: КОВА, sint 
— color format Охс9: КОВА, uint 
— color format Oxca: RGBA, float 
— color format Oxce: RGBX, float 
* texture format 0x04: 32 32 
— color format Oxcb: RG, float 
— color format Oxcc: RG, sint 
— color format Oxcd: RG, uint 
* texture format 0х05: 32, 8 X24 
Element size 4: 


* texture format 0x07: 8 8 8 X8 


Todo: htf do I determine if a surface format counts as 0x07 or 0x08? 


* texture format 0x08: 8 8 8 8 


— color format Oxcf: BGRA, unorm 
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— color format OxdO: 
— color format Oxd5: 
— color format Oxd6: 
— color format Oxd7: 
— color format Oxd8: 
— color format Oxd9: 
— color format Охеб: 
— color format Охе7: 
— color format Oxf9: 
— color format Oxfa: 
— color format Oxfd: 


— color format Oxfe: 


BGRA, unorm, SRGB 
RGBA, unorm 
RGBA, unorm, SRGB 
RGBA, snorm 
RGBA, sint 

RGBA, uint 

BGRX, unorm 
BGRX, unorm, SRGB 
RGBX, unorm 
RGBX, unorm, SRGB 
BGRX, unorm [XXX] 
BGRX, unorm [XXX] 


* texture format 0x09: 10 10 10 2 


— color format Oxd1: 


— color format Oxd2: 


— color format Oxdf: 


RGBA, unorm 
RGBA, uint 
BGRA, unorm 


* texture format OxOc: 16 16 


— color format Oxda: 
— color format Oxdb: 
— color format Oxdc: 
— color format Oxdd: 


— color format Oxde: 


RG, unorm 
RG, snorm 
RG, sint 
RG, uint 
RG, float 


texture format 0х04: 24 8 


texture format Ох0е: 8 24 


texture format OxOf: 32 


— color format Oxe3: 
— color format Oxe4: 


— color format Oxe5: 


— color format Oxff: 


R, sint 

R, uint 

R, float 

Y, uint [XXX] 


texture format 0x21: 11 11 10 


— color format Охе0: RGB, float 


Element size 2: 


е texture format 0x12: 4 4 4 4 


* texture format 0x13: 1 5 5 5 


* texture format 0x14: 5 5 5 1 


— color format 0xe9: BGRA, unorm 
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— color format Oxf8: BGRX, unorm 


— color format Oxfb: BGRX, unorm [XXX] 
— color format Охїс: ВСЕХ, unorm [XXX] 


texture format Ox15: 5 6 5 


— color format Охе8: ВСЕ, unorm 


texture format 0x16: 5 5 6 


texture format 0x18: 8 8 


texture format Ox1b: 16 


color format Охеа: 
color format Oxeb: 
color format Oxec: 


color format Oxed: 


color format Oxee: 


color format Oxef: 
color format Oxf0: 
color format Oxf1: 


color format Oxf2: 


Element size 1: 


* texture format Ox1d: 8 


color format Oxf3: 
color format Oxf4: 
color format Oxf5: 
color format Oxf6: 


color format Oxf7: 


RG, unorm 
RG, snorm 
RG, uint 
RG, sint 


R, unorn 
R, snorm 
R, sint 
R, uint 


R, float 


R, unorm 
R, snorm 
R, sint 
R, uint 


A, unorm 


* texture format Ох1е: 4 4 


Todo: which component types are valid for a given bitfield size? 


Todo: clarify float encoding for weird sizes 


Shared exponent color format 


A shared exponent color format is like a simple color format, but there's an extra bitfield, called E, that's used as a 
shared exponent for СО-С2. The remaining three bitfields correspond to the mantissas of СО-С2, respectively. They 


can be swizzled arbitrarily, but they have to use the float type. 


Element size 4: 


* texture format 0x20: 9 9 9 E5 
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YUV color formats 


These formats are also similar to color formats. However, The components are conventionally called Y, U, V: CO is 
known as U, С1 is known as Y, and C2 is known as V. An element represents two pixels, and has 4 bitfields: YA 
representing Y value for first pixel, YB representing Y value for second pixel, U representing U value for both pixels, 
and V representing V value of both pixels. There are two YUV formats, differing in bitfield order: 


Element size 4: 
* texture format 0x21: 08 YA8 V8 YB8 
* texture format 0x22: ҮА8 08 ҮВ8 V8 


Todo: verify I haven't screwed up the ordering here 


Zeta surface format 


A zeta surface, like a simple color surface, has one element per sample. It contains up to three components: the depth 
component [called Z], optionally the stencil component [called S], and if coverage sampling is in use, the coverage 
component [called C]. 


Тһе Z component can be a 32-bit float, a 24-bit normalized unsigned integer, or [on G200+] a 16-bit normalized 
unsigned integer. The S component, if present, is always an 8-bit raw integer. 


The C component is special: if present, it’s an 8-bit bitfield in each sample. However, semantically it is a per-pixel 
value, and the values of the samples’ C components are stitched together to obtain a per-pixel value. This stitching 
process depends on the multisample mode, thus it needs to be specified to bind a coverage sampled zeta surface as a 
texture. It’s not allowed to use a coverage sampling mode with a zeta format without C component, or the other way 
around. 


Like with color formats, there are two different enums that specify zeta formats: texture formats and zeta formats. 
However, this time the zeta formats have one-to-many relationship with texture formats: Texture format contains in- 
formation about the specific coverage sampling mode used, while zeta format merely says whether coverage sampling 
is in use, and the mode is taken from RT multisample configuration. 


For textures, Z corresponds to СО, S to СІ, and C to C2. However, C cannot be used together with Z and/or S in a 
single sampler. Z and S sampling works normally, but when C is sampled, the sampler returns preprocessed weights 
instead of the raw value - see graph/g80-texture.txt for more information about the sampling process. 


The formats are: 
Element size 2: 
* zeta format 0x13: Z16 [G200+ only] 
— texture format 0x3a: Z16 [G200+ only] 
Element size 4: 
e zeta format ОхОа: 732 
— texture format Ox2f 
* zeta format 0x14: 58 724 
— texture format 0x29 
е zeta format 0x15: 724 X8 


— texture format 0x2b 
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* zeta format 0x16: 724 S8 
— texture format Ох2а 

* zeta format 0х18: Z24 C8 
— texture format Ox2c: MS4_CS4 
— texture format Ox2d: MS8 С58 
— texture format Ox2e: MS4 С512 

Element size 8: 

* zeta format 0х19: 732 58 X24 
— texture format 0x30 

e zeta format Ox1d: 724 X8 S8 C8 X16 
— texture format 0x31: MS4 С54 
— texture format 0x32: MS8 С58 
— texture format 0x37: MS4_CS12 

e zeta format 0х1е: Z32 X8 C8 X16 
— texture format 0x33: MS4 С54 
— texture format 0x34: MS8 С58 
— texture format 0x38: MS4 С512 

e zeta format Ox1f: 732 S8 C8 X16 
— texture format 0x35: MS4 С54 
— texture format 0x36: MS8 С58 
— texture format 0x39: MS4 С512 


Todo: figure out the М58 CS24 formats 


Compressed texture formats 


Todo: write me 


Bitmap surface format 


A bitmap surface has only one component, and the component has 1 bit per sample - that is, the component's value can 
be either О or 1 for each sample in the surface. The surface is made of 8-byte elements, with each element representing 
8x8 block of samples. The element is treated as a 64-bit word, with each sample taking 1 bit. The bits start from LSB 
and are ordered first by x coordinate of the sample, then by its y coordinate. 


This format can be used for 2d engine and texturing. When used for texturing, it forces using a special "box" filter: 
result of sampling is a percentage of "lit" area in WxH rectangle centered on the sampled location. See graph/g80- 


texture.txt for more details. 
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Todo: figure out more. Check how it works with 2d engine. 


The formats are: 
Element size 8: 
* texture format Ox1f: BITMAP 
— color format Ox1c: BITMAP 


G80 storage types 
On G80, the storage type is made of two parts: the storage type itself, and the compression mode. The storage type is 
a 7-bit enum, the compression mode is a 2-bit enum. 
The compression modes are: 
* 0: NONE - no compression 
* 1: SINGLE - 2 compression tag bits per gob, 1 tag cell per 64КВ page 
* 2: DOUBLE - 4 compression tag bits per gob, 2 tag cells per 64kB page 


Todo: verify somehow. 


The set of valid compression modes varies with the storage type. NONE is always valid. 


As mentioned before, the low-level rearrangement is further split into two sublevels: short range reordering, rearrang- 
ing bytes in a single gob, and long range reordering, rearranging gobs. Short range reordering is performed for both 
VRAM and system RAM, and is highly dependent on the storage type. Long range reordering is done only for VRAM, 
and has only three types: 


* none [NONE] - no reordering, only used for storage type 0 [pitch] 
* small scale [SSR] - gobs rearranged inside a single 4КВ page, used for non-0 storage types 


* large scale [LSR] - large blocks of memory rearranged, based on internal VRAM geometry. Boundaries between 
VRAM areas using NONE/SSR and LSR need to be properly aligned in physical space to prevent conflicts. 


Long range reordering is described in detail in G60:GF100 VRAM structure and usage. 
The storage types can be roughly split into the following groups: 

* pitch storage type: used for pitch surfaces and non-surface buffers 

* blocklinear color storage types: used for non-zeta blocklinear surfaces 

* zeta storage types: used for zeta surfaces 


On the original G80, non-0 storage types can only be used on УКАМ, on G84 and later cards they can also be used on 
system RAM. Compression modes other than NONE can only be used on VRAM. However, due to the G80 limitation, 
blocklinear surfaces stored in system RAM are allowed to use storage type 0, and will work correctly for texturing and 
m2mf source/destination - rendering to them with 2d or 3d engine is impossible, though. 


Correct storage types are only enforced by texture units and ROPs [ie. 2d and 3d engine render targets + СОРА 
global/local/stack spaces], which have dedicated paths to memory and depend on the storage types for performance. 
The other engines have storage type handling done by the common memory controller logic, and will accept any 
storage type. 


The pitch storage type is: 
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storage type 0x00: PITCH long range reordering: NONE valid compression modes: NONE There's no short range 
reordering on this storage type - the offset inside a gob is identical between the virtual and physical addresses. 


Blocklinear color storage types 


Todo: reformat 


The following blocklinear color storage types exist: 


storage type 0x70: BLOCKLINEAR long range reordering: SSR valid compression modes: NONE valid surface 
formats: any non-zeta with element size of 1, 2, 4, or 8 bytes valid multisampling modes: any 


storage type 0x72: BLOCKLINEAR LSR long range reordering: LSR valid compression modes: NONE valid 
surface formats: any non-zeta with element size of 1, 2, 4, or 8 bytes valid multisampling modes: any 


storage type 0x76: BLOCKLINEAR 128 LSR long range reordering: LSR valid compression modes: NONE 
valid surface formats: any non-zeta with element size of 16 bytes valid multisampling modes: any 


[XXX] 


storage type 0x74: BLOCKLINEAR 128 long range reordering: SSR valid compression modes: NONE valid sur- 
face formats: any non-zeta with element size of 16 bytes valid multisampling modes: any 


[XXX] 


storage type 0x78: BLOCKLINEAR 32 MSA long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: any non-zeta with element size of 4 bytes valid multisampling modes: MS1, MS2*, 
MS4* 


storage type 0x79: BLOCKLINEAR 32 MSS long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: any non-zeta with element size of 4 bytes valid multisampling modes: MS8* 


storage type 0x7a: BLOCKLINEAR 32 М54 LSR long range reordering: LSR valid compression modes: 
NONE, SINGLE valid surface formats: any non-zeta with element size of 4 bytes valid multisampling modes: 
М51, MS2*, MS4* 


storage type 0x7b: BLOCKLINEAR_32_MS8_LSR long range reordering: LSR valid compression modes: 
NONE, SINGLE valid surface formats: any non-zeta with element size of 4 bytes valid multisampling modes: 
MS8* 


[XXX] 


storage type 0x7c: BLOCKLINEAR_64_MS4 long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: any non-zeta with element size of 8 bytes valid multisampling modes: MS1, MS2*, 
MS4* 


storage type 0x7d: BLOCKLINEAR 64 MSS long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: any non-zeta with element size of 8 bytes valid multisampling modes: MS8* 


[XXX] 


storage type 0x44: BLOCKLINEAR 24 long range reordering: SSR valid compression modes: NONE valid sur- 
face formats: texture format 8 8 8 X8 and corresponding color formats valid multisampling modes: any 


storage type 0x45: BLOCKLINEAR 24 MS4 long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: texture format 8 8 8 X8 and corresponding color formats valid multisampling 
modes: М51, MS2*, MS4* 
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storage type 0x46: BLOCKLINEAR 24 MSS long range reordering: SSR valid compression modes: NONE, SIN- 
GLE valid surface formats: texture format 8 8 8 X8 and corresponding color formats valid multisampling 
modes: MS8* 


storage type 0х4: BLOCKLINEAR 24 LSR long range reordering: LSR valid compression modes: NONE valid 
surface formats: texture format 8 8 8 X8 and corresponding color formats valid multisampling modes: any 


storage type 0х4с: BLOCKLINEAR 24 М54 LSR long range reordering: LSR valid compression modes: 
NONE, SINGLE valid surface formats: texture format 8 8 8 X8 and corresponding color formats valid multi- 
sampling modes: MS1, MS2*, MS4* 


storage type 0х44: BLOCKLINEAR 24 М58 LSR long range reordering: LSR valid compression modes: 
NONE, SINGLE valid surface formats: texture format 8 8 8 X8 and corresponding color formats valid multi- 
sampling modes: MS8* 


[XXX] 


Zeta storage types 


Todo: write me 


GF100 storage types 


Todo: write me 


2.7.8 Tesla virtual memory 


Contents 


* Tesla virtual memory 
- Introduction 


— VM users 


Channels 


— DMA objects 


Page tables 


TLB flushes 


— User vs supervisor accesses 


— Storage types 


Compression modes 


— VM faults 
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Introduction 


G80 generation cards feature an MMU that translates user-visible logical addresses to physical ones. The translation 
has two levels: DMA objects, which behave like x86 segments, and page tables. The translation involves the following 
address spaces: 


logical addresses: 40-bit logical address + channel descriptor address + DMAobj address. Specifies an address 
that will be translated by the relevant DMAobj, and then by the page tables if DMAobj says so. All addresses 
appearing in FIFO command streams are logical addresses, or eventually translated to logical addresses 


virtual addresses: 40-bit virtual address + channel descriptor address, specifies an address that will be looked 
up in the page tables of the relevant channel. Virtual addresses are always a result of logical address translation 
and can never be specified directly. 


linear addresses: 40-bit linear address + target specifier, which сап be VRAM, 5Ү5КАМ SNOOP, ог SYS- 
RAM, МО5ХООР. They can refer to: 


— VRAM: 32-bit linear addresses - high 8 bits are ignored - on-board memory of the card. Supports LSR 
and compression. See G60:GF100 VRAM structure and usage 


— SYSRAM: 40-bit linear addresses - accessing this space will cause the card to invoke PCIE read/write 
transactions to the given bus address, allowing it to access system RAM or other PCI devices’ memory. 
ЅҮЅКАМ SNOOP uses normal PCIE transactions, SYSRAM, NOSNOOP uses PCIE transactions with 
the “по snoop" bit set. 


Mostly, linear addresses are a result of logical address translation, but some memory areas are specified directly 
by their linear addresses. 


12-bit tag addresses: select a cell in hidden compression tag RAM, used for compressed areas of VRAM. See 
G80 VRAM compression 


physical address: for VRAM, the partition/subpartition/row/bank/column coordinates of a memory cell; for 
SYSRAM, the final bus address 


Todo: kill this list in favor of an actual explanation 


The VM's job is to translate a logical address into its associated data: 


linear address 

target: VRAM, SYSRAM SNOOP, ог 5Ү5КАМ NOSNOOP 
read-only flag 

supervisor-only flag 


storage type: a special value that selects the internal structure of contained data and enables more efficient 
accesses by increasing cache locality 


compression mode: if set, write accesses will attempt to compress the written data and, if successful, write only 
a fraction of the original write size to memory and mark the tile as compressed in the hidden tag memory. Read 
accesses will transparently uncompress the data. Can only be used on VRAM. 


compression tag address: the address of tag cell to be used if compression is enabled. Tag memory is addressed 
by "cells". Each cell is actually 0x200 tag bits. For SINGLE compression mode, every 0x10000 bytes of 
compressed VRAM require 1 tag cell. For DOUBLE compression mode, every 0x10000 bytes of VRAM 
require 2 tag cells. 


partition cycle: either short or long, affecting low-level VRAM storage 


encryption flag [G84+]: for SYSRAM, causes data to be encrypted with a simple cipher before being stored 
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A VM access can also end unsuccessfully due to multiple reasons, like a non present page. When that happens, a 
VM fault is triggered. The faulting access data is stored, and fault condition is reported to the requesting engine. 
Consequences of a faulted access depend on the engine. 


VM users 


VM is used by several clients, which are identified by VM client id: 


A related concept is VM engine, which is a group of clients that share TLBs and stay on the same channel at any single 
moment. It's possible for a client to be part of several VM engines. The engines are: 


Client+engine combination doesn't, however, fully identify the source of the access - to disambiguate that, DMA slot 
ids are used. The set of DMA slot ids depends on both engine and client id. The DMA slots are [engine/client/slot]: 


0/0/0: PGRAPH STRMOUT 

* 0/3/0: PGRAPH context 

* 0/3/1: РОКАРН NOTIFY 

0/3/2: РОКАРН QUERY 

• 0/3/3: РОКАРН COND 

0/3/4: РОКАРН m2mf BUFFER IN 
* 0/3/5: РОКАРН m2mf BUFFER. OUT 
0/3/6: РОКАРН m2mf BUFFER, NOTIFY 
* 0/5/0: РОКАРН CODE CB 

0/5/1: PGRAPH TIC 

* 0/5/2: РОКАРН TSC 

0/7/0: PGRAPH CLIPID 

* 0/9/0: РОКАРН VERTEX 

0/a/0: РОКАРН TEXTURE / SRC2D 
* 0/b/0-7: PGRAPH RT 0-7 

0/b/8: РОКАРН ZETA 

* 0/b/9: РСКАРН LOCAL 

0/b/a: РОКАРН GLOBAL 

* 0/b/b: РОКАРН STACK 

* 0/b/c: РОКАРН DST2D 

* 4/4/0: PEEPHOLE write 

* 4/8/0: PEEPHOLE read 

* 6/4/0: BARI write 

6/8/0: BARI read 

* 6/4/1: BAR3 write 

6/8/1: BAR3 read 

5/8/0: FIFO pushbuf read 
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5/A/1: FIFO semaphore write 

5/8/1: FIFO semaphore read 

c/8/1: FIFO background semaphore read 

1/6/8: PVP1 context [G80:G84] 

7116/4: PME context [G80:G84] 

8/6/1: PMPEG CMD [G80:G98 G200:MCP77] 

8/6/2: PMPEG DATA [G80:G98 G200:MCP77] 

8/6/3: PMPEG IMAGE [G80:G98 G200:MCP77] 
8/6/4: PMPEG context [G80:G98 G200:MCP77] 
8/6/5: PMPEG QUERY [G84:G98 G200:MCP77] 
b/f/0: PCOUNTER record buffer [G84:GF100] 
1/c/0-f: PVP2 DMA ports 0-Oxf [G84:G98 G200:MCP77] 
9/d/0-f: PBSP DMA ports 0-Oxf [G84:G98 G200:MCP77] 
а/е/0: PCIPHER context [G84:G98 G200:MCP77] 
a/e/1: PCIPHER SRC [G84:G98 G200:MCP77] 

a/e/2: PCIPHER DST [G84:G98 G200:MCP77] 

a/e/3: PCIPHER QUERY [G84:G98 G200:MCP77] 
1/с/0-7: PPDEC falcon ports 0-7 [G98:G200 MCP77-] 
8/6/0-7: PPPP falcon ports 0-7 [G98:G200 MCP77-] 
9/4/0-7: PVLD falcon ports 0-7 [G98:G200 MCP77-] 
а/е/0-7: PSEC falcon ports 0-7 [G98:GT215] 
d/13/0-7: PCOPY falcon ports 0-7 [GT215-] 

e/11/0-7: PDAEMON falcon ports 0-7 [GT215-] 
7/14/0-7: РУСОМР falcon ports 0-7 [MCP89-] 


Todo: 


PVPI 


Todo: 


PME 


Todo: 


Move to engine doc? 


Channels 


АП VM accesses are done on behalf of some “channel”. А VM channel is just a memory structure that contains the 
DMA objects and page directory. VM channel can be also a FIFO channel, for use by PFIFO and fifo engines and 
containing other data structures, or just a “bare” VM channel for use with non-fifo engines. 
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A channel is identified by a "channel descriptor", which is a 30-bit number that points to the base of the channel 
memory structure: 


* bits 0-27: bits 12-39 of channel memory structure linear address 


* bits 28-29: the target specifier for channel memory structure - 0: УКАМ - 1: invalid, do not use - 2: SYS- 
RAM. SNOOP - 3: SYSRAM. NOSNOOP 


The channel memory structure contains a few fixed-offset elements, as well as serving as a container for channel 
objects, such as DMA objects, that can be placed anywhere inside the structure. Due to the channel objects inside it, 
the channel structure has no fixed size, although the maximal address of channel objects is Ох НО. Channel structure 
has to be aligned to 0x1000 bytes. 


The original G80 channel structure has the following fixed elements: 
* 0x000-0x200: КАМЕС [fifo channels only] 
* 0x200-0x400: DMA objects for fifo engines' contexts [fifo channels only] 
* 0x400-0x1400: PFIFO CACHE [fifo channels only] 
* 0x1400-0x5400: page directory 

G84- cards instead use the following structure: 
* 0x000-0x200: DMA objects for fifo engines' contexts [fifo channels only] 
* 0x200-0x4200: page directory 


The channel objects are specified by 16-bit offsets from start of the channel structure in Ox10-byte units. 


DMA objects 


The only channel object type that VM subsystem cares about is DMA objects. DMA objects represent contiguous 
segments of either virtual or linear memory and are the first stage of VM address translation. DMA objects can be 
paged or unpaged. Unpaged DMA objects directly specify the target space and all attributes, merely adding the base 
address and checking the limit. Paged DMA objects add the base address, then look it up in the page tables. Attributes 
can either come from page tables, or be individually overriden by the DMA object. 


DMA objects are specifid by 16-bit "selectors". In case of fifo engines, the RAMHT is used to translate from user- 
visible 32-bit handles to the selectors [see КАМНТ and the FIFO objects]. The selector is shifted left by 4 bits and 
added to channel structure base to obtain address of DMAobj structure, which is 0x18 bytes long and made of 32-bit 
LE words: 


word 0: 


e bits 0-15: object class. Ignored by VM, but usually validated by fifo engines - should be 0x2 [read-only], 
0x3 [write-only], or Ox3d [read-write] 


* bits 16-17: target specifier: 


— 0: VM - paged object - the logical address is to be added to the base address to obtain a virtual address, 
then the virtual address should be translated via the page tables 


- 1: УКАМ - unpaged object - the logical address should be added to the base address to directly obtain 
the linear address in VRAM 


- 2: SYSRAM_SNOOP - like УКАМ, but gives SYSRAM address 

- 3: SYSRAM_NOSNOOP - like УКАМ, but gives SYSRAM address and uses nosnoop transactions 
* bits 18-19: read-only flag 

— 0: use read-only flag from page tables [paged objects only] 
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— 1: read-only 
— 2: read-write 
* bits 20-21: supervisor-only flag 
— 0: use supervisor-only flag from page tables [paged objects only] 
— 1: user-supervisor 
— 2: supervisor-only 


* bits 22-28: storage type. If the value is 0x7f, use storage type from page tables, otherwise directly 
specifies the storage type 


* bits 29-30: compression mode 
— 0: no compression 
- 1: SINGLE compression 
2: DOUBLE compression 


3: use compression mode from page tables 
* bit 31: if set, is a supervisor DMA object, user DMA object otherwise 
word 1: bits 0-31 of limit address 
word 2: bits 0-31 of base address 
word 3: 
* bits 0-7: bits 32-39 of base address 
* bits 24-31: bits 32-39 of limit address 
word 4: 
* bits 0-11: base tag address 
* bits 16-27: limit tag address 
word 5: 
* bits 0-15: compression base address bits 16-31 [bits 0-15 are forced to 0] 
* bits 16-17: partition cycle 
— 0: use partition cycle from page tables 
— 1: short cycle 
- 2: long cycle 
* bits 18-19 [G84-]: encryption flag 
— 0: not encrypted 
— 1: encrypted 
— 2: use encryption flag from page tables 


First, DMA object selector is compared with O. If the selector is 0, NULL DMAOBJ fault happens. Then, the logical 
address is added to the base address from DMA object. The resulting address is compared with the limit address from 
DMA object and, if larger or equal, DMAOBJ LIMIT fault happens. If DMA object is paged, the address is looked up 
in the page tables, with read-only flag, supervisor-only flag, storage type, and compression mode optionally overriden 
as specified by the DMA object. Otherwise, the address directly becomes the linear address. For compressed unpaged 
VRAM objects, the tag address is computed as follows: 
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* take the computed VRAM linear address and substract compression base address from it. if result is negative, 
force compression mode to none 


* shift result right by 16 bits 
* add base tag address to the result 
* if result <= limit tag addres, this is the tag address to use. Else, force compression mode to none. 
Places where DMA objects are bound, that is MMIO registers or FIFO methods, are commonly called “ОМА slots". 


Most engines cache the most recently bound DMA object. To flush the caches, it's usually enough to rewrite the 
selector register, or resubmit the selector method. 


It should be noted that many engines require the DMA object's base address to be of some specific alignment. The 
alignment depends on the engine and slot. 


The fifo engine context dmaobjs are a special set of DMA objects worth mentioning. They're used by the fifo engines 
to store per-channel state while given channel is inactive on the relevant engine. Their size and structure depend on 
the engine. They have fixed selectors, and hence reside at fixed positions inside the channel structure. On the original 
G80, the objects are: 


Selector | Address | Engine 
0x0020 0x00200 | РОКАРН 
0x0022 0x00220 | PVPI 
0x0024 0x00240 | PME 
0x0026 0x00260 | PMPEG 


On G84+ cards, they are: 


Selector | Address | Present on | Engine 
0x0002 0x00020 | all PGRAPH 
0x0004 0x00040 | VP2 PVP2 
0x0004 0x00040 | VP3- PPDEC 
0x0006 0x00060 | VP2 PMPEG 
0x0006 0x00060 | VP3- PPPP 
0x0008 0x00080 | VP2 PBSP 
0x0008 0x00080 | VP3- PVLD 
0x000a 0х000а0 | VP2 PCIPHER 
0x000a 0х00040 | VP3 PSEC 
0x000a 0х00040 | MCP89- РУСОМР 
0х000с 0х000с0 | GT215- PCOPY 


Page tables 


If paged DMA object is used, the virtual address is further looked up in page tables. The page tables are two-level. 
Top level is 0x800-entry page directory, where each entry covers 0x20000000 bytes of virtual address space. The page 
directory is embedded in the channel structure. It starts at offset 0x1400 on the original G80, at 0x200 on G84+. Each 
page directory entry, or PDE, is 8 bytes long. The PDEs point to page tables and specify the page table attributes. Each 
page table can use either small, medium [GT215-] or large pages. Small pages are 0x1000 bytes long, medium pages 
are 0x4000 bytes long, and large pages are 0x10000 bytes long. For small-page page tables, the size of page table can 
be artificially limitted to cover only 0x2000, 0x4000, or 0x8000 pages instead of full 0x20000 pages - the pages over 
this limit will fault. Medium- and large-page page tables always cover full 0x8000 or 0x2000 entries. Page tables of 
both kinds are made of 8-byte page table entries, or PTEs. 
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Todo: verify GT215 transition for medium pages 


The PDEs are made of two 32-bit LE words, and have the following format: 
word 0: 
* bits 0-1: page table presence and page size 
— 0: page table not present 
- 1: large pages [64kiB] 
— 2: medium pages [16kiB] [GT215-] 
- 3: small pages [4kiB] 
* bits 2-3: target specifier for the page table itself 
- 0: УКАМ 
1: invalid, do not use 
- 2: SYSRAM SNOOP 
- 3: SYSRAM NOSNOOP 
* bit 4: 222 [XXX: figure this out] 


* bits 5-6: page table size [small pages only] 
— 0: 0x20000 entries [full] 
- 1: 0x8000 entries 
— 2: 0x4000 entries 
— 3: 0x2000 entries 
* bits 12-31: page table linear address bits 12-31 
word 1: 
* bits 32-39: page table linear address bits 32-39 
The page table start address has to be aligned to 0x1000 bytes. 
The PTEs are made of two 32-bit LE words, and have the following format: 
word 0: 
* bit 0: page present 
e bits 1-2: ??? [XXX: figure this out] 
* bit 3: read-only flag 
* bits 4-5: target specifier 
- 0: УКАМ 
- 1: invalid, do not use 
- 2: S YSRAM SNOOP 
- 3: SYSRAM NOSNOOP 
* bit 6: supervisor-only flag 


* bits 7-9: log2 of contig block size in pages [see below] 
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* bits 12-31: bits 12-31 of linear address [small pages] 
* bits 14-31: bits 14-31 of linear address [medium pages] 
* bits 16-31: bits 16-31 of linear address [large pages] 
word 1: 
* bits 32-39: bits 32-39 of linear address 
* bits 40-46: storage type 
* bits 47-48: compression mode 
* bits 49-60: compression tag address 
* bit 61: partition cycle 
- 0: short cycle 
- 1: long cycle 
* bit 62 [G84-]: encryption flag 


Contig blocks are a special feature of PTEs used to save TLB space. When 2^o adjacent pages starting on 2^o 
page aligned bounduary map to contiguous linear addresses [and, if appropriate, contiguous tag addresses] and have 
identical other attributes, they can be marked as a contig block of order o, where o is 0-7. To do this, all PTEs for that 
range should have bits 7-9 set equal to o, and linear/tag address fields set to the linear/tag address of the first page in 
the contig block [ie. all PTEs belonging to contig block should be identical]. The starting linear address need not be 
aligned to contig block size, but virtual address has to be. 


TLB flushes 
The page table contents are cached in per-engine TLBs. To flush TLB contents, the TLB flush register 0х100с80 
should be used: 
ММПО 0x100c80: 
* bit 0: trigger. When set, triggers the TLB flush. Will auto-reset to O when flush is complete. 
* bits 16-19: VM engine to flush 


A flush consists of writing engine << 16 1 to this register and waiting until bit O becomes 0. However, note that 
G86 PGRAPH has a bug that can result in a lockup if PGRAPH TLB flush is initiated while PGRAPH is running, see 
graph/g80-pgraph.txt for details. 


User vs supervisor accesses 


Todo: write me 


Storage types 


Todo: write me 
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Compression modes 


Todo: write me 


VM faults 


Todo: write me 


2.7.9 G80:GF100 VRAM structure and usage 


Contents 


• G80:GF100 VRAM structure and usage 
- Introduction 
— Partition cycle 
* Tag memory addressing 
- Subpartition cycle 
— Row/bank/column split 


- Bank cycle 


— Storage types 


Introduction 


The basic structure of G80 memory is similiar to other card generations and is described in Memory structure. 


There are two sub-generations of G80 memory controller: the original G80 one and the GT215 one. The G80 memory 
controller was designed for DDR2 and GDDR3 memory. It's split into several [1-8] partitions, each of them having 64- 
bit memory bus. The GT215 memory controller added support for DDR3 and GDDR5 memory and split the partitions 
into two subpartitions, each of them having 32-bit memory bus. 


On G80, the combination of DDR2/GDDR3 Пе. 4n prefetch] memory with 64-bit memory bus results in 32-byte 
minimal transfer size. For that reason, 32-byte units are called sectors. Оп GT215, DDR3/GDDRS5 Пе. 8n prefetch] 
memory with 32-bit memory bus gives the same figure. 


Next level of granularity for memory is 256-byte gobs. Memory is always assigned to partitions in units of whole gobs 
- all addresses in a gob will stay in a single partition. Also, format dependent memory address reordering is applied 
within a gob. 


The final fixed level of УКАМ granularity is a 0x10000-byte [64kiB] large page. While G80 VM supports using 
smaller page sizes for VRAM, certain features [compression, long partition cycle] should only be enabled on per-large 
page basis. 


Apart from VRAM, the memory controller uses so-called tag RAM, which is used for compression. Compression is a 
feature that allows a memory block to be stored in a more efficient manner [eg. using 2 sectors instead of the normal 
8] if its contents are sufficiently regular. The tag RAM is used to store the compression information for each block: 
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whether it's compressed, and if so, in what way. Note that compression is only meant to save memory bandwidth, not 
memory capacity: the sectors saved by compression don't have to be transmitted over the memory link, but they're still 
assigned to that block and cannot be used for anything else. The tag RAM is allocated in units of tag cells, which have 
varying size depending on the partition number, but always correspond to 1 or 2 large pages, depending on format. 


VRAM is addressed by 32-bit linear addresses. Some memory attributes affecting low-level storage are stored together 
with the linear address in the page tables [or linear DMA object]. These are: 


* storage type: a 7-bit enumerated value that describes the memory purpose and low-level storage within a block, 
and also selects whether normal or alternative bank cycle is used 


* compression mode: a 2-bit field selecting whether the memory is: 
— not compressed, 
— compressed with 2 tag bits per block [1 tag cell per large page], or 
— compressed with 4 tag bits per block [2 tag cells per large page] 

* compression tag cell: a 12-bit index into the available tag memory, used for compressed memory 

e partition cycle: a 1-bit field selecting whether the short [1 block] or long [4 blocks] partition cycle is used 

The linear addresses are transformed in the following steps: 
1. The address is split into the block index [high 24 bits], and the offset inside the block [low 8 bits]. 


2. The block index is transformed to partition id and partition block index. The process depends on whether the 
storage type is blocklinear or pitch and the partition cycle selected. If compression is enabled, the tag cell index 
18 also translated to partition tag bit index. 


3. [GT215+ only] The partition block index is translated into subpartition ID and subpartition block index. If 
compression is enabled, partition tag bit index is also translated to subpartition tag bit index. 


4. [Sub]partition block index is split into row/bank/column fields. 


5. Row and bank indices are transformed according to the bank cycle. This process depends on whether the storage 
type selects the normal or alternate bank cycle. 


6. Depending on storage type and the compression tag contents, the offset in the block may refer to varying bytes 
inside the block, and the data may be transformed due to compression. When the required transformed block 
offsets have been determined, they're split into the remaining low column bits and offset inside memory word. 


Partition cycle 
Partition cycle is the first address transformation. Its purpose is converting linear [global] addressing to partition index 
and per-partition addressing. The inputs to this process are: 
* the block index [ie. bits 8-31 of linear VRAM address] 
* partition cycle selected [short or long] 
* pitch or blocklinear mode - pitch is used when storage type is PITCH, blocklinear for all other storage types 
* partition count in the system [as selected by PBUS HWUNITS register] 
The outputs of this process are: 
* partition ID 


* partition block index 
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Partition pre-ID and ID adjust are intermediate values in this process. 


On G80 [and G80 only], there are two partition cycles available: short one and long one. The short one switches 
partitions every block, while the long one switches partitions roughly every 4 blocks. However, to make sure addresses 
don't “bleed” between large page bounduaries, long partition cycle reverts to switching partitions every block near 
large page bounduaries: 


if partition cycle -- LONG and gpu -- G80: 
# round down to 4 ж partition count multiple 
group start = block index / (4 ж partition count) х 4 х partition count 
group end = group start + 4 х partition count - 1 
# check whether the group is entirely within one large page 
use long cycle = (group start & ~Oxff) == (group end & -Oxff) 
else: 


use long cycle = False 


On G84+, long partition cycle is no longer supported - short cycle is used regardless of the setting. 


Todo: verify it's really the G84 


When short partition cycle is selected, the partition pre-ID and partition block index are calculated by simple division. 
The partition ID adjust is low 5 bits of partition block index: 


if not use long cycle: 
partition preid = block index 5 partition count 
partition block index - block index / partition count 
partition id adjust = partition block index 6 Oxlf 


When long partition cycle is selected, the same calculation is performed, but with bits 2-23 of block index, and the 
resulting partition block index is merged back with bits 0-1 of block index: 


if use long cycle: 
quadblock index = block index >> 2 
partition preid = quadblock index $ partition count 
partition quadblock index = quadblock index / partition count 
partition id adjust = partition quadblock index & Oxlf 


partition block index - partition quadblock index «« 2 | (block index & 3) 


Finally, the real partition ID is determined. For pitch mode, the partition ID is simply equal to the partition pre-ID. For 
blocklinear mode, the partition ID is adjusted as follows: 


* for 1, 3, 5, or 7-partition GPUs: no change [partition ID = partition pre-ID] 


* for 2 or 6-partition GPUs: XOR together all bits of partition ID adjust, then XOR the partition pre-ID with the 
resulting bit to get the partition ID 


* for 4-partition GPUs: add together bits 0-1, bits 2-3, and bit 4 of partition ID adjust, substract it from partition 
pre-ID, and take the result modulo 4. This is the partition ID. 


* for 8-partition GPUs: add together bits 0-2 and bits 3-4 of partition ID adjust, substract it from partition pre-ID, 
and take the result modulo 8. This is the partition ID. 


In summary: 


if blocklinear or partition count in [1, 3, 5, 7]: 
partition id - partition preid 

elif partition count in [2, 6]: 
xor - 0 


(continues on next page) 
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(continued from previous page) 


for bit in range(5): 
xor ^- partition id adjust »» bit & 1 
partition id - partition preid ^ xor 
elif partition count -- 


sub = partition id adjust & 3 


sub += partition id adjust >> 2 & 3 
sub += partition id adjust >> 4 & 1 
partition id - (partition preid - sub) $ 4 


elif partition count -- 
sub = partition id adjust & 7 
sub += partition id adjust >> 3 & 3 
partition id = (partition preid - sub) $ 8 


Tag memory addressing 


Todo: write me 


Subpartition cycle 


On GT215+, once the partition block index has been determined, it has to be further transformed to subpartition ID and 
subpartition block index. On G80, this step doesn't exist - partitions are not split into subpartitions, and “subpartition” 
in further steps should be taken to actually refer to a partition. 


The inputs to this process are: 
* partition block index 
* subpartition select mask 
* subpartition count 
The outputs of this process are: 
* subpartition ID 
* subpartition block index 
The subpartition configuration is stored in the following register: 
MMIO 0x100268: [GT215-] 
e bits 8-10: SELECT MASK, a 3-bit value affecting subpartition ID selection. 
* bits 16-17: ??? 


* bits 28-29: ENABLE MASK, a 2-bit mask of enabled subpartitions. The only valid values are 1 [only 
subpartition 0 enabled] and 3 [both subpartitions enabled]. 


When only one subpartition is enabled, the subpartition cycle is effectively a МОР - subpartition ID is 0, and subparti- 
tion block index is same as partition block index. When both subpartitions are enabled, The subpartition block index 
is the partition block index shifted right by 1, and the subpartition ID is based on low 14 bits of partition block index: 


if subpartition count == 1: 
subpartition block index = partition block index 
subpartition id = 0 

else: 


(continues on next page) 
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(continued from previous page) 


subpartition block index = partition block index >> 1 

# bit 0 and bits 4-13 of the partition block index always used for 

# subpartition ID selection 

subpartition select bits = partition block index & Ox3ffl 

# bits 1-3 of partition block index only used if enabled by the select 

# mask 

subpartition select bits |= partition block index & (subpartition select mask <<, 
--1) 

# subpartition ID is а XOR of all the bits of subpartition select bits 

subpartition id = 0 

for bit in range(14): 

subpartition id ^= subpartition select bits >> bit & 1 


Todo: tag stuff? 


Row/bank/column split 


Todo: write me 


Bank cycle 


Todo: write me 


Storage types 


Todo: write me 


2.7.10 G80 VRAM compression 


Contents 


e 080 VRAM compression 


— Introduction 


Todo: write me 
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Introduction 


Todo: write me 


2.7.11 G80:GF100 P2P memory access 


Contents 


e G60:GF100 P2P memory access 


— Introduction 


— MMIO registers 


Todo: write me 


Introduction 


Todo: write me 


MMIO registers 


Todo: write me 


2.7.12 G80:GF100 BAR1 remapper 


Contents 


• G80:GF100 BARI remapper 


— Introduction 


— MMIO registers 


Todo: write me 
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Introduction 


Todo: write me 


MMIO registers 


Todo: write me 


2.7.13 GF100 virtual memory 


Contents 


* СЕ100 virtual memory 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.7.14 GF100- VRAM structure and usage 


Contents 


* GF100- VRAM structure and usage 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 
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2.7.15 GF100 VRAM compression 


Contents 


* GF100 VRAM compression 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.8 PFIFO: command submission to execution engines 


Contents: 


2.8.1 FIFO overview 


Contents 


* FIFO overview 


— Introduction 


— Overall operation 


Introduction 


Commands to most of the engines are sent through a special engine called PFIFO. PFIFO maintains multiple fully 
independent command queues, known as “channels” or "FIFO"s. Each channel is controlled through a “channel 
control area", which is a region of MMIO [pre-GF100] or УКАМ [GF100+]. PFIFO intercepts all accesses to that 
area and acts upon them. 


PFIFO internally does time-sharing between the channels, but this is transparent to the user applications. The engines 
that PFIFO controls are also aware of channels, and maintain separate context for each channel. 


The context-switching ability of PFIFO depends on card generation. Since NV40, PFIFO is able to switch between 
channels at essentially any moment. On older cards, due to lack of backing storage for the CACHE, a switch is only 
possible when the CACHE is empty. The PFIFO-controlled engines are, however, much worse at switching: they can 
only switch between commands. While this wasn't a big problem on old cards, since the commands were guaranteed 
to execute in finite time, introduction of programmable shaders with looping capabilities made it possible to effectively 
hang the whole GPU by launching a long-running shader. 


142 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Todo: check if it still holds on GF100 


On NVI:NVA, the only engine that PFIFO controls is PGRAPH, the main 2d/3d engine of the card. In addition, PFIFO 
can submit commands to the SOFTWARE pseudo-engine, which will trigger an interrupt for every submitted method. 


The engines that PFIFO controls on NV4:GF100 are: 


Id | Present on Name Description 
O | all SOFT- Not really an engine, causes interrupt for each command, can be used to execute 
WARE | driver functions in sync with other commands. 
1 all PGRAPH| Main engine of the сага: 2d, 3d, compute. 
2 |NV31:G98 PM- The PFIFO interface to VPE MPEG2 decoding engine. 
G200:MCP77 PEG 
3 | NV40:G84 PME VPE motion estimation engine. 
4 | NV41:G84 PVPI VPE microcoded vector processor. 
4 | VP2 PVP2 xtensa-microcoded vector processor. 
5 | VP2 PCI- AES cryptography and copy engine. 
PHER 
6 | VP2 PBSP xtensa-microcoded bitstream processor. 
2 | VP3- PPPP falcon-based video post-processor. 
4 | VP3- PPDEC | falcon-based microcoded video decoder. 
5 | VP3 PSEC falcon-based AES crypto engine. On VP4, merged into PVLD. 
6 | VP3- PVLD falcon-based variable length decoder. 
3 | GT215- PCOPY | falcon-based memory copy engine. 
5 | MCP89:GF100 | PV- falcon-based video compositing engine. 
COMP 


The engines that PFIFO controls on GF100- are: 


Id Id Id Id Id Present] Name | Description 
on 
GF100GK1045K20& K204GM107 
If If 1f 1f 1f all SOFT- | Not really an engine, causes interrupt for each command, 
WARE | can be used to execute driver functions in sync with other 
commands. 
0 0 0 0 0 all PGRAPHMain engine of the card: 2d, 3d, compute. 
1 1 1 ? - GF100:GNPPIDEC. falcon-based microcoded picture decoder. 
2 2 2 ? - GF100:GNPPPP | falcon-based video post-processor. 
3 3 3 ? - GF100:GNPMED | falcon-based variable length decoder. 
45 |- - - - ОЕ100:0ЮЮОРҮ falcon-based memory copy engines. 
- 6 5 ? 2 GK104: | PVENC falcon-based H.264 encoding engine. 
- 4,57| 4- |? 4- | ОК104:| PCOPY Memory copy engines. 
6 Э 
- - - ? 1 ОМ107:| PVDEC falcon-based unified video decoding engine 
- - - 7 3 GM107:| PSEC | falcon-based AES crypto engine, recycled 


This file deals only with the user-visible side of the PFIFO. For kernel-side programming, see пу1-рї Оо, nv4-pfifo, 


880-рї о, or gf100-pfifo. 


Note: GF100 information can still be very incomplete / not exactly true. 
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Overall operation 


The PFIFO can be split into roughly 4 pieces: 
* PFIFO pusher: collects user's commands and injects them to 
* PFIFO CACHE: a big queue of commands waiting for execution by 
* PFIFO puller: executes the commands, passes them to the proper engine, or to the driver. 


* PFIFO switcher: ticks out the time slices for the channels and saves / restores the state of the channels between 
PFIFO registers and RAMFC memory. 


A channel consists of the following: 
channel mode: PIO [NV1:GF100], ОМА [NV4:GF100], or IB [G80-] 
PFIFO DMA pusher state [DMA and IB channels only] 


PFIFO CACHE state: the commands already accepted but not yet executed 
PFIFO puller state 


КАМЕС: area of УКАМ storing the above when channel is not currently active on PFIFO [not user-visible] 


RAMHT [pre-GF100 only]: a table of “objects” that the channel can use. The objects are identified by arbitrary 
32-bit handles, and can be DMA objects [see NV3 DMA objects, МУ4:080 DMA objects, DMA objects] or 
engine objects [see Puller - handling of submitted commands by FIFO and engine documentation]. On pre-G80 
cards, individual objects can be shared between channels. 


vspace [G80+ only]: A hierarchy of page tables that describes the virtual memory space visible to engines while 
executing commands for the channel. Multiple channels can share a vspace. [see Tesla virtual memory, GF100 
virtual memory] 


engine-specific state 


Channel mode determines the way of submitting commands to the channel. PIO mode is available on pre-GF100 
cards, and involves poking the methods directly to the channel control area. It's slow and fragile - everything breaks 
down easily when more than one channel is used simultanously. Not recommended. See PIO submission to FIFOs for 
details. Оп МУ 1:МУ40, all channels support PIO mode. On NV40:G80, only first 32 channels support PIO mode. On 
G80:GF100 only channel 0 supports PIO mode. 


Todo: check PIO channels support on NV40:G80 


NV1 PFIFO doesn't support any DMA mode. 


NV3 PFIFO introduced a hacky DMA mode that requires kernel assistance for every submitted batch of commands 
and prevents channel switching while stuff is being submitted. See nv3-pfifo-dma for details. 


МУ4 PFIFO greatly enhanced the РМА mode and made it controllable directly through the channel control area. 
Thus, commands can now be submitted by multiple applications simultaneously, without coordination with each other 
and without kernel's help. DMA mode is described in DMA submission to FIFOs оп МҮЯ. 


G80 introduced IB mode. IB mode is a modified version of DMA mode that, instead of following a single stream 
of commands from memory, has the ability to stitch together parts of multiple memory areas into a single command 
stream - allowing constructs that submit commands with parameters pulled directly from memory written by earlier 
commands. IB mode is described along with DMA mode in DMA submission to FIFOs on NV4. 


GF100 rearchitectured the whole PFIFO, made it possible to have up to 3 channels executing simultaneously, and 
introduced a new DMA packet format. 


The commands, as stored in CACHE, are tuples of: 
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* subchannel: 0-7 

e method: 0-Ox1ffc [really 0-0х7 | pre-GF100, 0-Ox3ffc [really O-Oxfff] GF100+ 
e parameter: O-Oxffffffff 

* submission mode [МУ 10+]: I or NI 


Subchannel identifies the engine and object that the command will be sent to. The subchannels have no fixed assign- 
ments to engines/objects, and can be freely bound/rebound to them by using method 0. The "objects" are individual 
pieces of functionality of PFIFO-controlled engine. A single engine can expose any number of object types, though 
most engines only expose one. 


The method selects an individual command of the object bound to the selected subchannel, except methods 0-Oxfc 
which are special and are executed directly by the puller, ignoring the bound object. Note that, traditionally, methods 
are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4: method Ox3f 
thus is written as Oxfc. This is a leftover from PIO channels. In the documentation, whenever a specific method 
number is mentioned, it'll be written pre-multiplied by 4 unless specified otherwise. 


The parameter is an arbitrary 32-bit value that accompanies the method. 


The submission mode is I if the command was submitted through increasing DMA packet, or NI if the command was 
submitted through non-increasing packet. This information isn't actually used for anything by the card, but it's stored 
in the CACHE for certain optimisation when submitting РОКАРН commands. 


Method execution is described in detail in DMA puller and engine-specific documentation. 


Pre-NV1A, PFIFO treats everything as little-endian. МУТА introduced big-endian mode, which affects pushbuffer/IB 
reads and semaphores. On NV1A:G80 cards, the endianness сап be selected per channel via the big endian flag. On 
G80- cards, PFIFO endianness is a global switch. 


Todo: look for GF100 PFIFO endian switch 


The channel control area endianness is not affected by the big endian flag or G80+ PFIFO endianness switch. Instead, 
it follows the PMC MMIO endianness switch. 


Todo: is it still true for GF100, with VRAM-backed channel control area? 


2.8.2 PIO submission to FIFOs 


Contents 


* PIO submission to FIFOs 
- Introduction 
— MMIO areas 
— Channel submission area 


— Free space determination 


— RAMRO 
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Todo: write me 


Introduction 


Todo: write me 


MMIO areas 


Todo: write me 


Channel submission area 


Todo: write me 


Free space determination 


Todo: write me 


RAMRO 


Todo: write me 


2.8.3 DMA submission to FIFOs on NV4 


Contents 


* DMA submission to FIFOs on NV4 
— Introduction 


Pusher state 


— Errors 
— Channel control area 


- NV4-style mode 


— IB mode 
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- The commands - pre-GF 100 format 


The commands 


ж NV4 method submission commands 
* NV4 control flow commands 
ж NV4 SLI conditional command 

- СЕ100 commands 


The pusher pseudocode - pre-GF 100 


Introduction 


There are two modes of DMA command submission: The NV4-style DMA mode and IB mode. 


Both of them are based on a conception of *pushbuffer": an area of memory that user fills with commands and tells 
PFIFO to process. The pushbuffers are then assembled into a “command stream" consisting of 32-bit words that make 
up “commands”. In NV4-style DMA mode, the pushbuffer is always read linearly and converted directly to command 
stream, except when the "jump", "return", or “сай” commands are encountered. In IB mode, the jump/call/return 
commands are disabled, and command stream is instead created with use of an “IB buffer". The IB buffer is a circular 
buffer of (base,length) pairs describing areas of pushbuffer that will be stitched together to create the command stream. 


NV4- style mode is available on NV4:GF100, IB mode is available on G80+. 


Todo: check for NV4-style mode on GF100 


In both cases, the command stream is then broken down to commands, which get executed. For most commands, the 
execution consists of storing methods into CACHE for execution by the puller. 


Pusher state 


The following data makes up the DMA pusher state: 


type name cards description 

dmaobj ата pushbuffer :СЕ100 1 the pushbuffer and IB 
DMA object 

b32 dma limit :GF100 12 pushbuffer size limit 

b32 dma put all pushbuffer current end ad- 
dress 

b32 dma get all pushbuffer current read 
address 

b11/12 ата state.mthd all Current method 

b3 ата state.subc all Current subchannel 

b24 ата state.mcnt all Current method count 

b32 dcount shadow NVS: number of already- 
processed methods in 
cmd 

bool dma_state.ni NV10+ Current command’s NI 
flag 


Continued on next page 
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Table 10 — continued from previous page 


type name cards description 

bool ата  state.lenp G80+ 5 Large NI command 
length pending 

b32 ref МУ10+ reference counter [shared 
with puller] 

bool subr. active МУ1А+ 7 Subroutine active 

b32 subr return МУІА+ 2 subroutine return ad- 
dress 

bool big endian NV11:G80 pushbuffer endian 
switch 

bool sli_enable G80+ SLI cond command en- 
abled 

512 sli mask G80+ SLI cond mask 

bool sli_active NV40+ SLI cond currently active 

bool ib_enable G80+ IB mode enabled 

bool nonmain G80+ 3 non-main pushbuffer ac- 
tive 

b8 dma_put_high G80+ extra 8 bits for dma_put 

b8 dma_put_high_rs G80+ dma_put_high read 
shadow 

b8 dma_put_high_ws G80+ 1 ата put high write 
shadow 

b8 dma get high G80+ extra 8 bits for dma_get 

b8 dma_get_high_rs G80+ dma_get_high read 
shadow 

b32 ib_put G80+ 3 ІВ current end position 

b32 ib_get G80+ 3 [B current read position 

540 ib. address G80+ 13 TB address 

58 ib order G80+ ІЗ [B size 

b32 dma mget G80+ 3 main pushbuffer last 
read address 

b8 dma mget high G80+ 3 extra 8 bits for 
dma mget 

bool ата mget val G80+ 3 dma, mget valid flag 

b8 ата mget high 18 G80+ 3 dma_mget_high read 
shadow 

bool dma mget val rs G80+ 3 dma mget val read 
shadow 


Errors 


On pre-GF100, whenever the DMA pusher encounters problems, it'll raise a DMA_PUSHER error. There are 6 types 
of DMA_PUSHER errors: 


1 


means that this part of state can only be modified by kernel intervention and is normally set just once, on channel setup. 


? means that state only applies to NV4-style mode, 


? means that state only applies to IB mode. 
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id | name reason 

1 CALL. SUBR, ACTIVE | call command while subroutine active 

2 | INVALID MTHD attempt to submit a nonexistent special method 
3 RET SUBR INACTIVE | return command while subroutine inactive 

4 | INVALID СМО invalid command 

5 | IB EMPTY attempt to submit zero-length IB entry 

6 MEM. FAULT failure to read from pushbuffer or IB 


Apart from pusher state, the following values are available оп NV5+ to aid troubleshooting: 
* dma get тр shadow: value of dma get before the last jump 
• rsvd shadow: the first word of last-read command 


* data shadow: the last-read data word 


Todo: verify those 


Todo: determine what happens on GF100 on all imaginable error conditions 


Channel control area 


The channel control area is used to tell card about submitted pushbuffers. The area is at least Ox1000 bytes long, 
though it can be longer depending on the card generation. Everything in the area should be accessed as 32-bit integers, 
like almost all of the MMIO space. The following addresses are usable: 


addr | R/W | name description 

0x40 | R/W | DMA PUT dma put, only writable when not in IB mode 

0x44 | R DMA_GET dma get 

0x48 | R REF ref 

0х4с | R/W | DMA_PUT_HIGH | dma_put_high_rs/ws, only writable when not in IB 

0x50 | R/W | ??? GF100+ only 

0x54 | R DMA_CGET 7 nv40+ only, connected to subr_return when subroutine active, dma_get 
when inactive. 

0x58 | R DMA_MGET dma_mget 

0х5с| R DMA_MGET_HIGH dma_mget_high_rs, dma_mget_val_rs 

0x60 | R DMA_GET_HIGH | dma_get_high_rs 

0x88 | R IB_GET 3ib get 

Ox8c | ЕЛУ | IB PUT ЗЫ put 


The channel control area is accessed in 32-bit chunks, but on G80+, ОМА GET, DMA PUT and ОМА MGET are 
effectively 40-bit quantities. To prevent races, the high parts of them have read and write shadows. When you read 
the address corresponding to the low part, the whole value is atomically read. The low part is returned as the result of 
the read, while the high part is copied to the corresponding read shadow where it can be read through a second access 
to the other address. DMA PUT also has a write shadow of the high part - when the low part address is written, it's 
assembled together with the write shadow and atomically written. 


To summarise, when you want to read full DMA PUT/GET/MGET, first read the low part, then the high part. Due to 
the shadows, the value thus read will be correct. To write the full value of ОМА PUT, first write the high part, then 
the low part. 
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Note, however, that two different threads reading these values simultanously can interfere with each other. For this 
reason, the channel control area shouldn't ever be accessed by more than one thread at once, even for reading. 


On NV4:NV40 cards, the channel control area is in BARO at address 0x800000 + 0x10000 * channel ID. On NV40, 
there are two BARO regions with channel control areas: the old-style is in BARO at 0x800000 + 0x10000 * channel 
ID, supports channels 0-Ox1f, can do both PIO and DMA submission, but does not have DMA_CGET when used in 
DMA mode. The new-style area is in BARO at 0xc0000 + 0x1000 * channel ID, supports only DMA mode, supports 
all channels, and has DMA CGET. On G80 cards, channel 0 supports PIO mode and has channel control area at 
0x800000, while channels 1-126 support ОМА mode and have channel control areas at 0xc00000 + 0x2000 * channel 
ID. On GF100, the channel control areas are accessed through selectable addresses in BARI and are backed Бу УКАМ 
or host memory - see GF100+ PFIFO for more details. 


Todo: check channel numbers 


NV4-style mode 


In NV4-style mode, whenever іта get != ата put, the card read a 32-bit word from the pushbuffer at the address 
specified by dma get, increments dma get by 4, and treats the word as the next word in the command stream. dma get 
can also move through the control flow commands: jump [sets аша get to param], call [copies іта get to subr. return, 
sets subr. active and sets dma, get to param], and return [unsets subr active, copies subr. return to ата get]. The calls 
and returns are only available on NV1A+ cards. 


The pushbuffer is accessed through the dma pushbuffer DMA object. On NV4, the DMA object has to be located in 
PCI ог АСР memory. On NV5+, апу DMA object is valid. At all times, ата get has to be <= іта limit. Going past 
the limit or getting a VM fault when attempting to read from pushbuffer results in raising ИМА  PUSHER error of 
type MEM FAULT. 


On рге-ХУТА cards, the word read from pushbuffer is always treated as little-endian. On NV1A:G80 cards, the 
endianness is determined by the big endian flag. On G80+, the PFIFO endianness is a global switch. 


Todo: What about GF100? 


Note that pushbuffer addresses over Oxffffffff shouldn't be used in NV4-style mode, even on G80 - they cannot be 
expressed in jump commands, ата limit, nor subr return. Why ата put writing supports it is a mystery. 


The usual way to use NV4-style mode is: 
1. Allocate a big circular buffer 
2. [NV1A+] if you intend to use subroutines, allocate space for them and write them out 
3. Point ата pushbuffer to the buffer, set ата get and ата put to its start 
4. To submit commands: 


І. If there's not enough space in ће pushbuffer between ата put and end to fit the command + a jump 
command, submit a jump-to-beginning command first and set ОМА РОТ to buffer start. 


2. Read DMA GET/DMA CGET until you get a value that's out of the range you're going to write. If on 
pre-NV40 and using subroutines, discard DMA, GET reads that are outside of the main buffer. 


3. Write out the commands at current ОМА PUT address. 


4. Set DMA PUT to point right after the last word of commands you wrote. 
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IB mode 


NV4-style mode, while fairly flexible, can only jump between parts of pushbuffer between commands. IB mode 
decouples flow control from the command structure by using a second “master” buffer, called the IB buffer. 


The IB buffer is a circular buffer of 8-byte structures called IB entries. The IB buffer is, like the pushbuffer, accessed 
through dma pushbuffer DMA object. The address of the IB buffer, along with its size, is normally specified on 
channel creation. The size has to be a power of two and can be in range 222. 


Todo: check the ib size range 


There аге two indices into the IB buffer: ib get and ib put. They're both in range of 0..2^ib order-1. Whenever no 
pushbuffer is being processed [ата put 2dma, get], and there are unread entries in the IB buffer [ib put!—ib, get], the 
card will read an entry from IB buffer entry fib get and increment ib get by 1. When ib get would reach 2^ib order, 
it insteads wraps around to 0. 


Failure to read IB entry due to VM fault will, like pushbuffer read fault, cause DMA PUSHER error of type 
MEM. FAULT. 


The IB entry is made of two 32-bit words in PFIFO endianness. Their format is: 
Word 0: 

* bits 0-1: unused, should be 0 

e bits 2-31: ADDRESS LOW, bits 2-31 of pushbuffer start address 


Word 1: 
e bits 0-7: ADDRESS HIGH, bits 32-39 of pushbuffer start address 
* bit 8: ??? 


* bit9: NOT MAIN, "not main pushbuffer" flag 
* bits 10-30: SIZE, pushbuffer size in 32-bit words 
e bit 31: NO PREFETCH (probably; use for pushbuffer data generated by ће GPU) 


Todo: figure out bit 8 some day 


When an IB entry is read, the pushbuffer is prepared for reading: 


dma get[2:39] = ADDRESS 

апа put = апа get + SIZE x 4 
nonmain - NOT. MAIN 

if (!nonmain) ата mget = апа get 


Subsequently, just like in NV4-style mode, words from dma get are read until it reaches dma put. When that happens, 
processing can move on to the next IB entry [or pause until user sends more commands]. If the nonmain flag is not 
set, dma, get is copied to dma_mget whenever it's advanced, and dma mget val flag is set to 1. dma limit is ignored 
in IB mode. 


An attempt to submit IB entry with length zero will raise DMA PUSHER error of type IB EMPTY. 


The nonmain flag is meant to help with a common case where pushbuffers sent through IB can come from two sources: 
a "main" big circular buffer filled with immediately generated commands, and "external" buffers containing helper 
data filled and managed through other means. DMA MGET will then contain the address of the current position 
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in the “таш” buffer without being affected by IB entries pulling data from other pushbuffers. It's thus similiar to 
ОМА  CGET'5 role in NV4-style mode. 


The commands - pre-GF100 format 


The command stream, as assembled by NV4-style or IB mode pushbuffer read, is then split into individual commands. 
The command type is determined by its first word. The word has to match one of the following forms: 


000ССССССССССС00555«МММММММММММОО | increasing methods [NV4+] 
00000000000000001MMMMMMMMMMMM XX00 SLI conditional [NV40+, if enabled] 
00000000000000100000000000000000 return [NV1A+, NV4-style only] 
0000000000000011555 МММММММММММ00 long non-increasing methods [IB only] 
001JJJJJJJJJJJJJJJJJJJJJJJJJJJOO old jump [NV4+, NV4-style only] 
ОТОССССССССССС0О0555МММММММММММОО | non-increasing methods [NV10+] 
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJO 1 jump [NV1A+, NV4-style only] 
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ1O сай [МУІА+, NV4-style only] 


Todo: do ап exhaustive scan of commands 


If none of the forms matches, or if the one that matches cannot be used in current mode, the INVALID CMD 
DMA, PUSHER error is raised. 


The commands 


There are two command formats the DMA pusher can use: NV4 format and GF100 format. All cards support the NV4 
format, while only GF100+ cards support the GF100 format. 


NV4 method submission commands 


000ССССССССССС0055«МММММММММММОО | increasing methods [NV4+] 
ОТОССССССССССС0О0555МММММММММММОО | non-increasing methods [NV10+] 
0000000000000011555 МММММММММММ00 long non-increasing methods [IB only] 


These three commands are used to submit methods. the MM..M field selects the first method that will be submitted. 
The SSS field selects the subchannel. The CC..C field is mthd count and says how many words will be submitted. 
With the “long non-increasing methods" command, the method count is instead contained in low 24 bits of the next 
word in the pushbuffer. 


The subsequent mthd count words after the first word [or second word in case of the long command] are the method 
parameters to be submitted. If command type is increasing methods, the method number increases by 4 [ie. by 1 
method] for each submitted word. If type is non-increasing, all words are submitted to the same method. 


If sli enable is set and sli active is not set, the methods thus assembled will be discarded. Otherwise, they'll be 
appended to the CACHE. 


Todo: didn't та 0 work even if sli, active-0? 
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The pusher watches the submitted methods: it only passes methods 0x100+ and methods in 0..0xfc range that the 
puller recognises. An attempt to submit invalid method in 0..0xfc range will cause a DMA PUSHER error of type 
INVALID MTHD. 


Todo: check pusher reaction on ACQUIRE submission: pause? 


МУ4 control flow commands 


001JJJJJJJJJJJ3JJJJJJJJJJJJJJJOO old jump [МУ4+] 
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJO 1 jump [NV1A+] 
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ1O call [NV1A+] 
00000000000000100000000000000000 | return [NV1A+] 


For jumps and calls, J..JJ is bits 2-28 or 2-31 of the target address. The remaining bits of target are forced to 0. 


The jump commands simply set dma, get to the target - the next command will be read from there. There are two 
commands, since NV4 originally supported only 29-bit addresses, and used high bits as command type. NVIA 
introduced the new jump command that instead uses low bits as type, and allows access to full 32 bits of address 
range. 


The call command copies ата get to subr. return, sets subr. active to 1, and sets ата get to the target. If subr active 
is already set before the call, ће DMA, PUSHER error of type CALL. SUBR, ACTIVE is raised. 


The return command copies subr return to ата get and clears subr active. If subr. active isn't set, it instead raises 
DMA_PUSHER error of type RET SUBR INACTIVE. 


МУ4 SLI conditional command 


00000000000000001MMMMMMMMMMMMXXOO0 | SLI conditional [NV40+] 


МУ40 introduced SLI functionality. One of the associated features is the SLI conditional command. In SLI mode, 
sister channels are commonly created on all cards in SLI set using a common pushbuffer. Since most of the commands 
set in SLI will be identical for all cards, this saves resources. However, some of the commands have to be sent only to 
a single card, or to a subgroup of cards. The SLI conditional can be used for that purpose. 


The sli_active flag determines if methods should be accepted at the moment: when it’s set, methods will be accepted. 
Otherwise, they'll be ignored. SLI conditional command takes the encoded mask, MM..M, ands it with the per-card 
value of sli mask, and sets sli active flag to 1 if result if non-0, to 0 otherwise. 


The sli enable flag determines if the command is available. If it's not set, the command effectively doesn't exist. Note 
that sli enable and 511 mask exist on both NV40:G80 and G80+, but on NV40:G80 they have to be set uniformly for 
all channels on the card, while G80+ allows independent settings for each channel. 


The XX bits in the command are ignored. 


GF100 commands 


GF100 format follows the same idea, but uses all-new command encoding. 
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000ССССССССССС00555МММММММММММХХ increasing methods [old] 
000ХХХХХХХХХХХОТММММММММММММХХХХ | SLI conditional 
000ХХХХХХХХХХХТОММММММММММММХХХХ | SLI user mask store [new] 
O00XXXXXXXXXXXIIXXXXXXXXXXXXXXXX SLI conditional from user mask [new] 
001ССССССССССССС555ХММММММММММММ increasing methods [new] 
010CCCCCCCCCCCO0SSSMMMMMMMMMMMXX non-increasing methods [old] 
011ССССССССССССС555ХММММММММММММ non-increasing methods [new] 
100VVVVVVVVVVVVVSSSXMMMMMMMMMMMM | inline method [new] 


101CCCCCCCCCCCCCSSSXMMMMMMMMMMMM increase-once methods [new] 


110ХХХХХХХХХХХХХХХХХХХХХХХХХХХХХ 222 [XXX] [new] 


Todo: check bitfield bounduaries 


Todo: check the extra SLI bits 


Todo: look for other forms 


Increasing and non-increasing methods work like on older cards. Increase-once methods is a new command that works 
like the other methods commands, but sends the first data word to method M, second and all subsequent data words to 
method М+4 Пе. the next method]. 


Inline method command is a single-word command that submits a single method with a short [12-bit] parameter 
encoded in VV..V field. 


GF 100 also did away with the INVALID MTHD error - invalid low methods are pushed into CACHE as usual, puller 
will complain about them instead when it tries to execute them. 


The pusher pseudocode - pre-GF100 


while(1) 4 
if (ата get !- ата put) 1 
/ж pushbuffer non-empty, read a word. х/ 
532 word; 
try ( 
if (!ib enable && dma get »- dma limit) 
throw DMA PUSHER (MEM FAULT); 
if (ори « МУ1А) 
word = READ DMAOBJ 32(dma pushbuffer, апа get, LE); 
else if (gpu « G80) 
word = READ DMAOBJ 32(dma pushbuffer, dma get, big. 
—endian?BE:LE); 
else 
word = READ DMAOBJ 32 (ама pushbuffer, аша get, pfifo 
—endian); 
dma get += 4; 
if (!nonmain) 
dma mget - dma get; 
} catch (VM FAULT) 1 
throw DMA PUSHER (MEM FAULT); 
(continues on next page) 
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(continued from previous page) 


} 

/* пом, S in th 

if (dma state.lenp) ( 
/* second word 


if we'r middle of a command 


of long non-inc methods c 
cx / 
dma state.lenp = 0; 
dma state.mcnt word & Oxffffff; 
} else if (апа state.mcnt) { 


/* data word of methods command х/ 


data shadow - word; 
if (!PULLER, KNOWS MTHD (аша state.mthd)) 
throw DMA PUSHER (INVALID. MTHD); 
if (!sli enable || sli active) { 
CACHE PUSH(dma state.subc, 
—state.ni); 
} 
if (!dma, state.ni) 


апа state.mthd-t-*; 
dma state.mcnt--; 
dcount_shadow++; 
} else { 
/* no command active - this is the first 
rsvd_shadow word; 
/ж match all forms »/ 


dma_state.mthd, 


*/ 


ommand - method count,, 


word, аша. 


word of a new one х/ 


if ((word & 0хе0000003) == 0x20000000 && !ib enable) ( 
/* old jump x/ 
dma get jmp shadow = dma, get; 
ата get = word & Oxlfffffff; 
} else if ((word & 3) == 1 && !ib enable && gpu >= NV1A) { 
/* jump x/ 
dma get jmp shadow = dma, get; 
ата get = word 8 Oxfffffffc; 
} else if ((word & 3) == 2 && 110 enable && gpu >= NV1A) { 
/* call «*/ 
if (subr active) 
throw DMA PUSHER(CALL SUBR ACTIVE); 
subr return = ата get; 
subr active = 1; 
ата get = word & Oxfffffffc; 
) else if (word == 0x00020000 && !ib enable && ори >= NV1A) { 
/* return x/ 
if (!subr. active) 
throw DMA PUSHER(RET SUBR INACTIVE); 
dma get = subr return; 
subr active - 0; 
) else if ((word & 0xe0030003) == 0) ( 
/* increasing methods х*/ 
dma state.mthd = (word >> 2) & Ox7ff; 
dma state.subc = (word >> 13) & 7; 
ата state.mcnt = (word >> 18) 4 Ox7ff; 
ата state.ni = 0; 
dcount shadow = 0; 
) else if ((word & 0xe0030003) == 0x40000000 && ори >= NV10) { 
/ж non-increasing methods ж/ 
dma state.mthd = (word >> 2) & Ox7ff; 
dma state.subc = (word >> 13) & 7; 
ата state.mcnt = (word >> 18) 4 Ox7ff; 


(continues on next page) 
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(continued from previous page) 


апа state.ni = 1; 
dcount, shadow = 0; 
| else if ((word & Oxffff0003) == 0х00030000 && ib enable) { 
/* long non-increasing methods х/ 
dma state.mthd = (word >> 2) & Ox7ff; 
dma state.subc = (word >> 13) & 7; 
dma state.lenp = 1; 
апа state.ni = 1; 
dcount shadow = 0; 
) else if ((word & Oxffff0003) == 0x00010000 && sli enable) { 
if (sli mask & ((word »» 4) & Oxfff)) 
sli active = 1; 
else 
sli active = 0; 


) else { 


throw РМА PUSHER(INVALID CMD); 


) 


| else if (ib enable && ib get !- ib put) 4 
/* current pushbuffer empty, but we have more IB entries to read х/ 
b64 entry; 
try { 


entry low = READ DMAOBJ 32(dma pushbuffer, ib address + ib. 
—get ж 8, pfifo endian); 

entry high = READ DMAOBJ 32(dma pushbuffer, ib address + ib. 
--дек х 8 + 4, pfifo endian); 


entry = entry high << 32 | entry low; 
ib get-tt*; 
if (ib get -- (1 «« ib order)) 


ib get = 0; 
} catch (VM FAULT) 1 
throw ОМА PUSHER (MEM FAULT); 


} 
len = entry >> 42 & Ox3fffff; 
if (!len) 
throw DMA PUSHER(IB EMPTY); 
dma get = entry & Oxfffffffffc; 
ата put = ата get + len х 4; 
if (entry & 1 «« 41) 


nonmain = 1; 
else 
nonmain = 0; 
} 
/* otherwise, pushbuffer empty and IB empty or nonexistent - nothing to do. */ 


2.8.4 Puller - handling of submitted commands by FIFO 


Contents 


* Puller - handling of submitted commands by FIFO 


- Introduction 
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- КАМНТ and the FIFO objects 
* NV4:GFI00 
* NV3 
* NVI 
- Puller state 
— Engine objects 
- Puller builtin methods 
* Syncing with host: reference counter 


* Semaphores 


* Misc puller methods 


Introduction 
PFIFO puller's job is taking methods out of the CACHE and delivering them to the right place for execution, or 
executing them directly. 


Methods 0-Oxfc are special and executed by the puller. Methods 0x100 and up are forwarded to the engine object 
currently bound to a given subchannel. The methods are: 


Method Present on | Name Description 

0x0000 all OBJECT Binds an engine object 

0х0008 GF100- NOP Does nothing 

0x0010 G84- SEMAPHORE ADDRESS НІСЕЕуу/-57у/е semaphore address high part 

0x0014 G84- SEMAPHORE ADDRESS LOWlew-style semaphore address low part 

0x0018 G84- SEMAPHORE SEQUENCE | New-style semaphore payload 

0х001с G84- SEMAPHORE TRIGGER New-style semaphore trigger 

0x0020 G84- NOTIFY INTR Triggers an interrupt 

0x0024 G84- WRCACHE FLUSH Flushes write post caches 

0x0028 MCP89- ??? 22? 

0х002с MCP89- 22? 22? 

0x0050 NV10- REF CNT Writes the ref counter 

0x0060 NV1A:GF100 DMA_SEMAPHORE DMA object for semaphores 

0x0064 NVIA- SEMAPHORE OFFSET Old-style semaphore address 

0х0068 NV1A- SEMAPHORE ACQUIRE Old-style semaphore acquire trigger and payload 

0х006с NVIA- SEMAPHORE_RELEASE Old-style semaphore release trigger and payload 

0x0070 GF100- 22? 22? 

0x0074 GF100- 22? 22? 

0х0078 GF100- 2?? 22? 

0х007с GF100- 22? 22? 

0х0080 МУ40- YIELD Yield PFIFO - force channel switch 

0х0100:0х2000 NV1:NV4 ne Passed down to the engine 

0x0100:0x0180 NV4:GF100 |... Passed down to the engine 

0x0180:0x0200 NV4:GF100 |... Passed down to the engine, goes through 
RAMHT lookup 

0x0200:0x2000 NV4:GF100 | ... Passed down to the engine 

0x0100:0x4000 GF100- des Passed down to the engine 
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Todo: missing the GF100+ methods 


RAMHT and the FIFO objects 


As has been already mentioned, each channel has 8 “subchannels” which can be bound to engine objects. On рге- 
GF100 GPUs, these objects and DMA objects are collectively known as “FIFO objects”. FIFO objects and КАМНТ 
don't exist on GF100+ PFIFO. 


The RAMHT is a big hash table that associates arbitrary 32-bit handles with FIFO objects and engine ids. Whenever 
a method is mentioned to take an object handle, it means the parameter is looked up in RAMHT. When such lookup 
fails to find a match, a CACHE ERROR(NO HASH) error is raised. 


NV4:GF100 


Internally, a FIFO object is a [usually small] block of data residing in “instance memory". The instance memory is 
RAMIN for pre-G80 GPUs, and the channel structure for G80+ GPUs. The first few bits of a FIFO object determine 
its ‘class’. Class is 8 bits on NV4:NV25, 12 bits on NV25:NV40, 16 bits on NV40:GF100. 


The data associated with a handle in RAMHT consists of engine id, which determines the object's behavior when 
bound to a subchannel, and its address in RAMIN [pre-G80] or offset from channel structure start [G80+]. 


Apart from method 0, the engine id is ignored. The suitability of an object for a given method is determined by 
reading its class and checking if it makes sense. Most methods other than 0 expect a DMA object, although a couple 
of pre-G80 graph objects have methods that expect other graph objects. 


The following are commonly accepted object classes: 
* 0x0002: DMA object for reading 
* 0x0003: DMA object for writing 
* 0x0030: NULL object - used to effectively unbind a previously bound object 
* 0x003d: DMA object for reading/writing 
Other object classes are engine-specific. 


For more information on DMA objects, see МУЗ DMA objects, NV4:G60 DMA objects, or DMA objects. 


NV3 


NV3 also has RAMHT, but it's only used for engine objects. While NV3 has DMA objects, they have to be bound 
manually by the kernel. Thus, they're not mentioned in RAMHT, and the 0x180-O0x1fc methods аге not implemented 
in hardware - they're instead trapped and emulated in software to behave like NV4+. 


NV3 also doesn't use object classes - the object type is instead a 7-bit number encoded in RAMHT along with engine 
id and object address. 


NV1 


You don't want to know how NVI RAMHT works. 
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Puller state 
type name GPUs description 
b24[8] ctx NV1:NV4 objects bound to subchannels 
b3 last_subc NVI:NVA last used subchannel 
b5[8] engines NV4+ engines bound to subchannels 
b5 last engine NV4+ last used engine 
b32 ref NV10+ reference counter [shared with pusher] 
bool acquire_active МУІА+ semaphore acquire in progress 
b32 acquire timeout МУІА+ semaphore acquire timeout 
b32 acquire timestamp МУ1А- semaphore acquire timestamp 
b32 acquire value МУ1А- semaphore acquire value 
dmaobj | dma semaphore NV11:GF100 | semaphore DMA object 
b12/16 | semaphore_offset NV11:GF100 | old-style semaphore address 
bool semaphore_off_val G80:GF100 semaphore_offset valid 
b40 semaphore_address G84+ new-style semaphore address 
b32 semaphore_sequence | G84+ new-style semaphore value 
bool acquire_source G84:GF100 semaphore acquire address selection 
bool acquire_mode G84+ semaphore acquire mode 


GF 100 state is likely incomplete. 


Engine objects 
The main purpose of the puller is relaying methods to the engines. First, an engine object has to be bound to a 
subchannel using method 0. Then, all methods >=0x100 on the subchannel will be forwarded to the relevant engine. 


On pre-NV4, the bound objects’ RAMHT information is stored as part of puller state. The last used subchannel is 
also remembered and each time the puller is requested to submit commands on subchannel different from the last one, 
method 0 is submitted, or channel switch occurs, the information about the object will be forwarded to the engine 


through its method 0. The information about an object is 24-bit, is known as object’s "context", and has the following 


fields: 
* bits 0-15 [NV1]: object flags 
* bits 0-15 [NV3]: object address 
* bits 16-22: object type 
* bit 23: engine id 
The context for objects is stored directly in their RAMHT entries. 


On NV4+ GPUs, the puller doesn't care about bound objects - this information is supposed to be stored by the engine 
itself as part of its state. The puller only remembers what engine each subchannel is bound to. On NV4:GF100 When 
method 0 is executed, the puller looks up the object in RAMHT, getting engine id and object address in return. The 
engine id is remembered in puller state, while object address is passed down to the engine for further processing. 


GF100- did away with RAMHT. Thus, method 0 now takes the object class and engine id directly as parameters: 
* bits 0-15: object class. Not used by the puller, simply passed down to the engine. 
* bits 16-20: engine id 


The list of valid engine ids сап be found on FIFO overview. The SOFTWARE engine is special: all methods submitted 
to it, explicitely or implicitely by binding a subchannel to it, will cause a CACHE ERROR(EMPTY SUBCHANNHEL) 
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interrupt. This interrupt can then be intercepted by the driver to implement a “software object", or can be treated as an 
actual error and reported. 


The engines run asynchronously. The puller will send them commands whenever they have space in their input queues 
and won't wait for completion of a command before sending more. However, when engines are switched [ie. puller 
has to submit a command to a different engine than last used by the channel], the puller will wait until the last used 
engine is done with this channel's commands. Several special puller methods will also wait for engines to go idle. 


Todo: verify this on all card families. 


On NV4:GF100 GPUs, methods 0х180-0х Їс are treated specially: while other methods are forwarded directly to 
engine without modification, these methods are expected to take object handles as parameters and will be looked up 
in RAMHT by the puller before forwarding. Ie. the engine will get the object's address found in RAMHT. 


та 0x0000 / 0x000: OBJECT On NV1:GF100, takes the handle of the object that should be bound to the sub- 
channel it was submitted on. On GF100+, it instead takes engine+class directly. 


if (gpu < МУ4) 4 
524 newctx = RAMHT LOOKUP (param); 
if (newctx & 0x800000) { 
/* engine == PGRAPH х/ 
if (ENGINE CUR CHANNEL(PGRAPH) !- chan) 
GINE CHANNEL SWITCH(PGRAPH, chan); 
ENGINE SUBMIT MTHD(PGRAPH, subc, 0, newctx); 


ctx[subc] = newctx; 
last subc = subc; 
) else 1 
/* engine == SOFTWARE »/ 
while (!ENGINE IDLE (PGRAPH)) 


, 


throw CACHE ERROR(EMPTY SUBCHANNEL); 


} 
) else { 
/* NV4+ GPU «*/ 
b5 engine; р16 eparam; 
if (ори >= GF100) ( 
eparam = param & Oxffff; 
engine = param >> 16 & Oxlf; 
/ж XXX: behavior with more bitfields? does it forward the whole thing? 


— ж / 
) else { 
engine - RAMHT LOOKUP (param).engine; 
eparam = RAMHT LOOKUP (param).addr; 
} 
if (engin І- last engine) { 
while (ENGINE CUR CHANNEL(last engine) == chan && !ENGINE IDLE(last 
—engine)) 
; 
} 
if (engine == SOFTWARE) { 
throw CACHE ERROR(EMPTY SUBCHANNEL); 
) else 1 
if (ENGINE CUR CHANNEL(engine) !- chan) 


ENGINE CHANNEL SWITCH(engine, chan); 


ENGINE SUBMIT MTHD(engine, subc, 0, eparam); 
last engine = engines[subc] = engine; 


(continues on next page) 
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(continued from previous page) 


mthd 0x0100-0x3ffc / 0x040-Oxfff: [forwarded to engine] 


if (ара « МУ4) 1 


if (subc !- last subc) 1 
if (ctx[subc] & 0x800000) { 
/* engine == PGRAPH х/ 
if (ENGINE CUR CHANNEL(PGRAPH) !- chan) 


ENGINE CHANNEL SWITCH(PGRAPH, chan); 
ENGINE SUBMIT MTHD(PGRAPH, subc, 0, ctx[subc]); 


last, subc = subo; 
) else { 


/* engine == SOFTWARE */ 
while (!ENGINE IDLE (PGRAPH)) 


, 


throw CACHE ЕББОБ(ЕМРТҮ SUBCHANNEL); 


} 
if (ctx[subc] & 0х800000) { 


/* engine == PGRAPH x/ 
if (ENGINE CUR CHANNEL(PGRAPH) !- chan) 
ENGINE CHANNEL SWITCH(PGRAPH, chan); 
ENGINE SUBMIT MTHD(PGRAPH, subc, mthd, param); 
) else 1 
/* engine == SOFTWARE »/ 
while (!ENGINE IDLE (PGRAPH)) 


, 


throw CACHE ERROR(EMPTY SUBCHANNEL); 


} 
) else { 
/* NV4+ х/ 
if (ара < СЕ100 && mthd >= 0х180/4 66 mthd < 0х200/4) 1 
param = RAMHT LOOKUP (param).addr; 
} 
if (engines[subc] != last_engine) { 
while (ENGINE CUR CHANNEL(last engine) == chan && !ENGINE_IDLE(last_ 


—engine)) 


} 


if (engines[subc] == SOFTWARE) { 
throw CACHE ERROR(EMPTY SUBCHANNEL); 
) else 1 
if (ENGINE CUR CHANNEL(engine) !- chan) 


ENGINE SUBMIT MTHD(engine, subc, mthd, param); 
last engine = engines[subc]; 


ENGINE CHANNEL SWITCH(engine, chan); 


Todo: verify all of the pseudocode... 


Puller builtin methods 
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Syncing with host: reference counter 


NV10 introduced a "reference counter". It's a per-channel 32-bit register that is writable by the puller and readable 
through the channel control area [see DMA submission to FIFOs on NV4]. It can be used to tell host which commands 
have already completed: after every interesting batch of commands, add a method that will set the ref counter to 
monotonically increasing values. The host code can then read the counter from channel control area and deduce which 
batches are already complete. 


The method to set the reference counter is REF. CNT, and it simply sets the ref counter to its parameter. When it's 
executed, it'll also wait for all previously submitted commands to complete execution. 


та 0x0050 / 0x014: КЕЕ CNT [NV 10:] 


while (ENGINE CUR CHANNEL(last engine) == chan && !ENGINE IDLE(last engine)) 


; 
ref = param; 


Semaphores 


МХУТА PFIFO introduced a concept of “semaphores”. A semaphore is a 32-bit word located in memory. G84 also 
introduced "long" semaphores, which are 4-word memory structures that include a normal semaphore word and a 
timestamp. 


The PFIFO semaphores can be “acquired” and “released”. Note that these operations are NOT the familiar P/V 
semaphore operations, they're just fancy names for “wait until value == X" and “write X". 


There are two “versions” of the semaphore functionality. Тһе “old-style” semaphores are implemented by 
NV1A:GF100 GPUs. The "new-style" semaphores are supported by G84+ GPUs. The differences are: 


Old-style semaphores 


e limitted addressing range: 12-bit [NV1A:G80] or 16-bit [G80:GF100] offset in а DMA object. Thus a special 
DMA object is required. 


* release writes a single word 
* acquire supports only “wait for value equal to X" mode 
New-style semaphores 
* full 40-bit addressing range 
* release writes word + timestamp, ie. long semaphore 
* acquire supports “wait for value equal to X" and “wait for value greater or equal X" modes 


Semaphores have to be 4-byte aligned. АП values are stored with endianness selected by big endian flag [NV1A:G80] 
ог by PFIFO endianness [G80+] 


On pre-GF100, both old-style semaphores and new-style semaphores use the DMA object stored in dma_semaphore, 
which can be set through DMA SEMAPHORE method. Note that this method is buggy on pre-G80 GPUs and accepts 
only write-only DMA objects of class 0x0002. You have to work around the bug by preparing such DMA objects [or 
using a kernel that intercepts the error and does the binding manually]. 


Old-style semaphores read/write the location specified in semaphore offset, which can be set by 
SEMAPHORE OFFSET method. The offset has to be divisible by 4 and fit in 12 bits [NV1A:G80] or 16 bits 
[G80:GF100]. An acquire is triggered by using the SEMAPHORE ACQUIRE mthd with the expected value as the 
parameter - further command processing will halt until the memory location contains the selected value. A release is 
triggered by using the SEMAPHORE RELEASE method with the value as parameter - the value will be written into 
the semaphore location. 
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New-style semaphores use the location specified in semaphore address, whose low/high parts can be set 
through SEMAPHORE ADDRESS HIGH and LOW methods. The value for acquire/release is stored in 
semaphore sequence and specified by SEMAPHORE SEQUENCE method. Acquire and release are triggered by 
using the SEMAPHORE TRIGGER method with the requested operation as parameter. 


The new-style release operation writes the following 16-byte structure to memory at semaphore address: 
* 0x00: [32-bit] semaphore sequence 
* 0x04: [32-bit] O 
* 0x08: [64-bit] PTIMER timestamp [see ptimer] 


The new-style “acquire equal" operation behaves exactly like old-style acquire, but uses ѕетарһоге address instead 
of semaphore offset and semaphore sequence instead of SEMAPHORE RELEASE param. The “acquire greater or 
equal" operation, instead of waiting for the semaphore value to be equal to semaphore sequence, it waits for value that 
satisfies (int32 t)(val - semaphore sequence) »- 0, ie. for a value that's greater or equal to semaphore sequence in 
32-bit wrapping arithmetic. The "acquire mask" operation waits for a value that, ANDed with semaphore sequence, 
gives a non-0 result [GF100+ only]. 


Failures of semaphore-related methods will trigger the SEMAPHORE error. The SEMAPHORE error has several 
subtypes, depending on card generation. 


NV1A:G80 SEMAPHORE error subtypes: 
• 1: INVALID OPERAND: wrong parameter to a method 
* 2: INVALID STATE: attempt to acquire/release without proper setup 
G80:GF100 SEMAPHORE error subtypes: 
e 1: ADDRESS UNALIGNED: address not divisible by 4 
* 2: INVALID STATE: attempt to acquire/release without proper setup 
* 3: ADDRESS TOO LARGE: attempt to set >40-bit address or >16-bit offset 
* 4: MEM FAULT: got VM fault when reading/writing semaphore 
GF100 SEMAPHORE error subtypes: 


Todo: figure this out 


If the acquire doesn't immediately succeed, the acquire parameters are written to puller state, and the read will be 
periodically retried. Further puller processing will be blocked on current channel until acquire succeeds. Note that, on 
G84+ GPUs, the retry reads are issued from SEMAPHORE BG VM engine instead of the PFIFO VM engine. There's 
also apparently a timeout, but it's not REd yet. 


Todo: RE timeouts 


mthd 0х0060 / 0x018: ОМА SEMAPHORE [O] [NV1A:GF100] 


obj = БАМНТ LOOKUP (param).addr; 
if (gpu < G80) { 


if (OBJECT_CLASS (obj) != 2) 

throw SEMAPHORE (INVALID. OPERAND); 
if (DMAOBJ_RIGHTS (obj) != WO) 

throw SEMAPHORE (INVALID. OPERAND); 


if (!DMAOBJ PT PRESENT (obj) ) 


(continues on next page) 
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(continued from previous page) 


} 


throw SE 


MAPHORE (INVALID_OPERAND) ; 


/ж G80 doesn't bother with verification х/ 
dma_semaphore = obj; 


Todo: is there ANY way to make G80 reject non-DMA object classes? 


та 0х0064 / 0x019: SEMAPHORE OFFSET [NV1A-] 


if (gpu 


< 680) ( 


if (param & -Oxffc) 


throw SE 


semaphore offset 


) else if (gpu « GF100) 


} else 1 


if (param & 3) 


MAPHORE (INVALID OPERAND); 
= param; 


{ 


throw SE 


MAPHORE (ADDRESS, UNALIGNED); 


if (param & Oxffff0000) 


throw SE 


MAP HORE (ADDRESS TOO LARGE); 


semaphore offset 
semaphore off va 


semaphore addres 


7 param; 
1 = 1; 
s[0:31] = param; 


mthd 0x0068 / 0х01а: SEMAPHORE_ACQUIRE [NV1A-] 


if (gpu 


if (gpu 


b32 word; 


if (gpu 


} else { 


< G80 && !dma_se 
/* unbound DMA o 
throw SEMAPHORE ( 


maphore) 
bject х/ 
INVALID. STATI 


>= G80 && !semap 


hore off val) 


throw SEMAPHORE( 


« G80) ( 
word = READ DMAO 


INVALID, STAT 


try 1 
word = R 


—endian); 


} 


} catch (VM_FAUL 


BJ 32 (4ша semaphore, semaphore offset, big endian?BE:LE); 


EAD DMAOBJ 32(dma semaphore, semaphore offset, pfifo_ 


T) 4 


т 


throw S 


IAPHORE (МЕМ FAULT); 


if (word == param) { 


) else 4 


/* already done 


*/ 


/ж acquire active will block further processing and schedule retries х/ 


acquire active = 1; 
acquire value = param; 
acquire timestamp = ???; 


/ж XXX: figure out timestamp/timeout business х/ 


if (gpu >= G80) 


{ 


acquire mode = 0; 


acquire 


source = 0; 
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mthd 0x006c / 0x01b: SEMAPHORE RELEASE [NV1A-] 


if (gpu < 680 && !dma semaphore) 

/ж unbound DMA object x/ 

throw SEMAPHORE (INVALID, STATE); 
if (gpu >= 080 && !semaphore off val) 

throw SEMAPHORE (INVALID, STATE); 
if (gpu < G80) { 


} else { 
try { 


WRITE_DMAOBJ_32 (dma_semaphore, semaphore offset, param, big endian?BE:LE); 


WRITE DMAOBJ 32(dma semaphore, semaphore offset, param, pfifo_ 


—endian); 
) catch (VM FAULT) { 
throw SEMAPHORE 


= 


EM FAULT); 


} 
} 


та 0x0010 / 0x004: SEMAPHORE ADDRESS HIGH [G84:] 


if (param & Oxffffff00) 
throw SEMAPHORE (ADDRESS TOO LARGE); 
semaphore_address[32:39] = param; 


` 


mthd 0x0014 / 0x005: SEMAPHORE_ADDRESS_LOW [G84:] 


if (param & 3) 
throw SEMAPHORE (ADDRESS UNALIGNED); 
semaphore_address[0:31] = param; 


mthd 0x0018 / 0x006: SEMAPHORE_SEQUENCE [G84:] 


semaphore_sequence = param; 


mthd 0х001с / 0x007: SEMAPHORE_TRIGGER [G84:] 
bits 0-2: operation 
* 1: ACQUIRE EQUAL 
* 2: WRITE LONG 
* 4: ACQUIRE GEQUAL 
* 8: ACQUIRE MASK [GF100-] 


Todo: bit 12 does something on GF100? 


ор = param & 7; 
b64 timestamp = PTIMER GETTIME(); 
if (param == 2) { 
if (ори « GF100) 4 
try ( 


WRITE DMAOBJ 32 (dma, semaphore, 


—param, pfifo endian); 


WRITE рОМАОВУ 32 (ата semaphore, 


semaphore_address+0x0 


—pfifo endian); 


WRITE DMAOBJ 64 (dma, semaphore, 


semaphore_address+0x4, 0, 


semaphore_address+0x8 


баа 


(continues on next page) 
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word = READ, DMAOBJ. 32 (dma semaphore, 


—pfifo endian); 


) catch (VM FAULT) { 
throw SEMAPHORE (MEM FAULT); 
} 
) else { 
word = READ VM 32(semaphore address, 


) catch (VM FAULT) { 
throw SEMAPHORE (MEM FAULT); 
} 
) else { 
WRITE VM 32(semaphore ааагеѕѕ+0х0, param, 
WRITE VM 32(semaphore ааагеѕѕ+0х4, 0, 
WRITE VM 64(semaphore address-*0x8, timestamp, 
} 
) else 4 
532 word; 
if (ара < GF100) { 
try ( 


pfifo endian); 
pfifo endian); 


pfifo endian); 


semaphore address,,, 


pfifo endian); 


if ((op == 1 && word == semaphore sequence) || (op == 4 && (int32 t) (word, 
—- semaphore sequence) >= 0) || (ор == 8 && word & semaphore sequence)) { 
/* already done х/ 
} else { 
/* XXX GF100 х/ 
acquire_source = 1; 
acquire_value = semaphore_sequence; 
acquire_timestamp = ???; 
if (op == 1) { 
acquire_active = 1; 
acquire_mode = 0; 
} else if (op == 4) { 
acquire_active = 1; 
acquire_mode = 1; 
} else { 
/ж invalid combination - results in hang */ 


Misc puller methods 


МУ40 introduced the YIELD method which, if there are any other busy channels at the moment, will cause PFIFO to 


switch to another channel immediately, without waiting for the timeslice to expire. 
та 0x0080 / 0x020: YIELD [NV40:] 
п PFIFO_YIELDQ; 


G84 introduced the NOTIFY_INTR method, which simply raises an interrupt that notifies the host of its execution. It 


can be used for sync primitives. 
mthd 0x0020 / 0x008: NOTIFY INTR [G84:] 
п РЕНО NOTIFY INTRO; 
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Todo: check how this is reported on GF100 


The G84+ WRCACHE FLUSH method can be used to flush PFIFO's write post caches. [see Tesla virtual memory] 
mthd 0x0024 / 0х009: WRCACHE FLUSH [G84:] 
п УМ WRCACHE FLUSH(PFIFO); 
The СЕ100-- МОР method does nothing: 
mthd 0x0008 / 0x002: NOP [GF100:] 


/ж do nothing */ 


2.9 PGRAPH: 2d/3d graphics and compute engine 


Contents: 


2.9.1 PGRAPH overview 


Contents 


* PGRAPH overview 
- Introduction 
- NVI/NV3 graph object types 
- NV4+ graph object classes 
- The NULL object 
— The graphics context 
* Channel context 
ж Graph object options 
ж Volatile state 
- Notifiers 
* NOTIFY method 


ж DMA. NOTIFY method 
* NOP method 


Introduction 


Todo: write me 
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Todo: WAIT FOR IDLE and PM TRIGGER 


NV1/NV3 graph object types 


The following graphics objects exist on NV1:NV4: 


id vari- name description 
ants 

0x01 | all BETA sets beta factor for blending 

0х02| all ROP sets raster operation 

0x03 | all CHROMA sets color for color key 

0x04} all PLANE sets the plane mask 

0х05 | all CLIP sets clipping rectangle 

0x06} all PATTERN sets pattern, ie. a small repeating image used as one of the inputs to a raster 
operation or blending 

0x07 | NV3:NV4 RECT renders solid rectangles 

0x08 | all POINT renders single points 

0x09} all LINE renders solid lines 

Охба| all LIN renders solid lins [ie. lines missing a pixel on one end] 

OxOb | all TRI renders solid triangles 

Ох0с | NVI:NV3 RECT renders solid rectangles 

OxOc | NV3:NV4 GDI renders Windows 95 primitives: rectangles and characters, with font read from 
a DMA object 

0х04| NVI:NV3 TEXLIN renders quads with linearly mapped textures 

0х04 NV3:NV4 M2MF copies data from one DMA object to another 

0хОе NVI:NV3 TEXQUAD renders quads with quadratically mapped textures 

Ox0e | NV3:NV4 SIFM Scaled Image From Memory, like NV1’s ТЕМ, but with scaling 

0х10| ай BLIT copies rectangles of pixels from one place in framebuffer to another 

Ox11 | all IFC Image From CPU, uploads a rectangle of pixels via methods 

0х12| all BITMAP uploads and expands a bitmap [ie. Ibpp image] via methods 

0x13} NVI:NV3 IFM Image From Memory, uploads a rectangle of pixels from a DMA object to 
framebuffer 

Ox14 | all ITM Image To Memory, downloads a rectangle of pixels to a DMA object from 
framebuffer 

0х15| NV3:NV4 SIFC Stretched Image From CPU, like IFC, but with image stretching 

0х 17 NV3:NV4 D3D Direct3D 5 textured triangles 

0x18 NV3:NV4 ZPOINT renders single points to a surface with depth buffer 

Ox1c NV3:NV4 SURF sets rendering surface parameters 

0х14| NVI:NV3 TEXLIN- renders lit quads with linearly mapped textures 

BETA 
Oxle| NVI:NV3 TEXQUAD- renders lit quads with quadratically mapped textures 
BETA 


Todo: check Direct3D version 
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NV4+ graph object classes 


Not really graph objects, but usable as parameters for some object-bind methods [all NV4:GF100]: 


class name description 
0х0030 | МУІ NULL does nothing 
0x0002 | МУІ DMA К | DMA object for reading 
0х0003 | NVI_DMA_W | DMA object for writing 
0x003d | NV3 DMA read/write DMA object 


Todo: document МУІ NULL 


NV l-style operation objects [all NV4:NV5]: 


class name description 

0х0010 | МУІ OP CLIP clipping 

0x0011 | МУІ OP BLEND AND blending 

0x0013 | МУІ OP КОР AND raster operation 

0x0015 | МУІ OP CHROMA color key 

0x0064 | МУІ OP SRCCOPY AND source copy with 0-а1рһа discard 
0x0065 | NV3 OP SRCCOPY source copy 

0х0066 | NV4 ОР SRCCOPY PREMUILT | pre-multiplying copy 

0x0067 | МУ4 OP BLEND PREMULT pre-multiplied blending 


Memory to memory copy objects: 


class variants name description 

0x0039 | МУ4:680 NV3 M2MF copies data from one buffer to another 
0x5039 | G80:GF100 G80 M2MF copies data from one buffer to another 
0x9039 | GF100:GK104 GF100 M2MF | copies data from one buffer to another 
0xa040 | GK104:GK110 GK20A ОК104 P2MF | copies data from FIFO to memory buffer 
0ха140 | GK110:GK20A ОМ107- | GK110 P2MF | copies data from FIFO to memory buffer 


Context objects: 
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class variants name description 

0x0012 | NV4:G84 NV1 BETA sets beta factor for blending 

0x0017 | NV4:G80 МУ1 CHROMA sets color for color key 

0x0057 | NV4:G84 NV4 CHROMA sets color for color key 

0х0018 | МУ4:680 NV1 PATTERN sets pattern for raster op 

0x0044 | NV4:G84 МУ1 PATTERN sets pattern for raster op 

0x0019 | NV4:G84 МУІ CLIP sets user clipping rectangle 

0x0043 | NV4:G84 МУІ КОР sets raster operation 

0x0072 | NV4:G84 NV4 BETA4 sets component beta factors for pre-multiplied blending 
0х0058 | МУ4:680 NV3 SURF DST sets the 2d destination surface 

0х0059 | МУ4:680 NV3 SURF SRC sets the 2d blit source surface 

0х005а | МУ4:680 NV3 SURF COLOR | sets the 3d color surface 

0х0055 | NV4:G80 NV3 SURF ZETA sets the 3d zeta surface 

0х0052 | МУ4:680 NV4 SWZSURF sets 2d swizzled destination surface 
0х009е | NV10:G80 NV10 SWZSURF sets 2d swizzled destination surface 
0x039e | NV30:NV40 | NV30 SWZSURF sets 2d swizzled destination surface 
0x309e | NV40:G80 NV30 SWZSURF sets 2d swizzled destination surface 
0x0042 | МУ4:680 NV4 SURF2D sets 2d destination and source surfaces 
0x0062 | NV10:G80 NV10 SURF2D sets 2d destination and source surfaces 
0x0362 | NV30:NV40 | NV30_SURF2D sets 2d destination and source surfaces 
0х3062 | NV40:G80 NV30 SURF2D sets 2d destination and source surfaces 
0x5062 | G80:G84 G80 SURF2D sets 2d destination and source surfaces 
0x0053 | NV4:NV20 | ХУ4 SURF3D sets 3d color and zeta surfaces 

0х0093 | NVIO:NV20 | ХУ10 SURF3D sets 3d color and zeta surfaces 


Solids rendering objects: 


class variants name description 
0х001с | NV4:NV40 NV1 LIN renders a lin 
0х005с | МУ4:680 NV4 LIN renders a lin 
0x035c | NV30:NV40 | NV30 LIN | renders a lin 
0x305c | NV40:G84 NV30 LIN | renders a lin 
0х0014 | NV4:NV40 МУ1 TRI renders a triangle 
0х0054 | NV4:G84 NV4 TRI renders a triangle 
0х001е | NV4:NV40 NV1 RECT | renders a rectangle 
0х005е | NV4:NV40 NV4 RECT | renders a rectangle 


Image upload from CPU objects: 
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class variants name description 

0x0021 | NV4:NV40 МУІ ТЕС ітаве от СРО 

0х0061 | NV4:G80 МУ4 ТЕС image from CPU 

0x0065 | NV5:G80 NV5 ЛЕС image from CPU 

0x008a | NV10:G80 NV10 IFC image from CPU 

0х038а | NV30:NV40 | NV30 IFC image from CPU 

0x308a | NV40:G84 МУ40 IFC image from CPU 

0x0036 | NV4:G80 NV1_SIFC stretched image from CPU 
0x0076 | NV4:G80 NV4_SIFC stretched image from CPU 
0х0066 | NV5:G80 NV5_SIFC stretched image from CPU 
0x0366 | NV30:NV40 | NV30_SIFC stretched image from CPU 
0x3066 | NV40:G84 NV40_SIFC stretched image from CPU 
0x0060 | NV4:G80 NV4 INDEX indexed image from CPU 
0x0064 | NV5:G80 NV5 INDEX indexed image from CPU 
0x0364 | NV30:NV40 | NV30 INDEX indexed image from CPU 
0x3064 | NV40:G84 МУ40 INDEX indexed image from CPU 
0x007b | NV10:G80 МУ10 TEXTURE | texture from CPU 

0x037b | NV30:NV40 | МУЗО TEXTURE | texture from CPU 

0x307b | NV40:G80 МУ40 TEXTURE | texture from CPU 


Todo: figure out wtf is the deal with TEXTURE objects 


Other 2d source objects: 


class variants name description 
0х0017 | NV4:G80 NV1_BLIT blits inside framebuffer 
0х005ї | NV4:G84 NV4_BLIT blits inside framebuffer 
Ox009f | NV15:G80 NV15 BLIT | blits inside framebuffer 
0x0037 | МУ4:680 МУЗ SIFM scaled image from memory 
0x0077 | МУ4:680 МУ4 SIFM scaled image from memory 
0x0063 | NV10:G80 NV5. SIFM scaled image from memory 
0x0089 | NV10:NV40 | МУ10 ІЕМ | scaled image from memory 
0x0389 | NV30:NV40 | NV30 5ЕМ | scaled image from memory 
0x3089 | NV40:G80 NV30_SIFM | scaled image from memory 
0x5089 | G80:G84 G80 SIFM scaled image from memory 
0x004b | NV4:NV40 | МУЗ GDI draws GDI primitives 
0x004a | МУ4:680 МУ4 GDI draws GDI primitives 
YCbCr two-source blending objects: 

class variants name 

0x0038 | NV4:G80 | NV4 DVD SUBPICTURE 

0x0088 | NV10:G80 | NVIO DVD SUBPICTURE 


Todo: find better name for these two 


Unified 2d objects: 
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class variants name 
0х5024 | G80:GF100 | G80 2D 
0х9024 | GF100- GF100 2D 
NV3-style 3d objects: 
class variants name description 
0х0048 | NV4:NV15 МУЗ D3D Direct3D textured triangles 
0x0054 | NV4:NV20 NV4 D3D5 Direct3D 5 textured triangles 
0x0094 | МУ10:МУ20 | ХУ10 р3р5 | Direct3D 5 textured triangles 
0х0055 | NV4:NV20 МУ4 D3D6 Direct3D 6 multitextured triangles 
0х0095 | NV10:NV20 | ХУ10 D3D6 | Direct3D 6 multitextured triangles 
Todo: check NV3 D3D version 
NV 10-style За objects: 
class variants name description 
0х0056 | NV10:NV30 NV10 3D Celsius Direct3D 7 engine 
0x0096 | NV15:NV30 NV15 3D Celsius Direct3D 7 engine 
0х0098 | NV17:NV20 NV11_3D Celsius Direct3D 7 engine 
0х0099 | NV17:NV20 NV17_3D Celsius Direct3D 7 engine 
0x0097 | NV20:NV34 NV20_3D Kelvin Direct3D 8 SM 1 engine 
0х0597 | NV25:NV40 NV25 3D Kelvin Direct3D 8 SM 1 engine 
0x0397 | NV30:NV40 NV30 3D Rankine Direct3D 9 SM 2 engine 
0x0497 | NV35:NV34 NV35 3D Rankine Direct3D 9 SM 2 engine 
0x3597 | NV40:NV41 NV35 3D Rankine Direct3D 9 SM 2 engine 
0x0697 | NV34:NV40 NV34 3D Rankine Direct3D 9 SM 2 engine 
0x4097 | МУ40:680!ТС | NV40 3D Curie Direct3D 9 SM 3 engine 
0х4497 | NV40:G80 TC NV44 3D Curie Direct3D 9 SM 3 engine 
0x5097 | G80:G200 G80 3D Tesla Direct3D 10 engine 
0x8297 | G84:G200 G84 3D Tesla Direct3D 10 engine 
0x8397 | G200:GT215 0200 3D Tesla Direct3D 10 engine 
0x8597 | ОТ215:МСР89 | GT215 3D | Tesla Direct3D 10.1 engine 
0x8697 | MCP89:GF100 | МСР89 3D | Tesla Direct3D 10.1 engine 
0x9097 | GF100:GK104 GF100 3D | Fermi Direct3D 11 engine 
0x9197 | GF108:GK104 GF108 3D | Fermi Direct3D 11 engine 
0x9297 | GF110:GK104 GF110 3D | Fermi Direct3D 11 engine 
0ха097 | GK104:GK110 | ОК104 3D | Kepler Direct3D 11.1 engine 
0ха197 | GK110:GK20A | ОК110 3D | Kepler Direct3D 11.1 engine 
0ха297 | ОК20А:0М107 | GK20A 3D | Kepler Direct3D 11.1 engine 
0х6097 | ОМ107- ОМ107 3D | Maxwell Direct3D 12 engine 


And the compute objects: 
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class variants name description 

0х50с0 | G80:GF100 G80 COMPUTE CUDA 1.x engine 
0х85с0 | GT215:GF100 СТ215 COMPUTE | CUDA 1.х engine 
0x90c0 | GF100:GK104 GF100 COMPUTE | CUDA 2.x engine 
0x91c0 | GF110:GK104 GF110 COMPUTE | CUDA 2.x engine 
Оха0с0 | GK104:GK110 GK20A:GM107 | GK104 COMPUTE | CUDA 3.x engine 
Оха1с0 | GK110:GK20A GK110 COMPUTE | CUDA 3.x engine 
ОхЬ0с0 | GM107:GM204 ОМ107 COMPUTE | CUDA 4.x engine 
0хь1с0 | GM204:- GM200 COMPUTE | CUDA 4.x engine 


The NULL object 


Todo: write me 


The graphics context 


Todo: write something here 


Channel context 


The following information makes up non-volatile graphics context. This state is per-channel and thus will apply to all 
objects on it, unless software does trap-swap-restart trickery with object switches. It is guaranteed to be unaffected 
by subchannel switches and object binds. Some of this state can be set by submitting methods on the context objects, 
some can only be set by accessing PGRAPH context registers. 


* the beta factor - set by BETA object 


the 8-bit raster operation - set by ROP object 
the А1К10С10В10 color for chroma key - set by CHROMA object 
the А1К10С10В10 color for plane mask - set by PLANE object 


the user clip rectangle - set by CLIP object: 


- 22? 


the pattern state - set by PATTERN object: 
— shape: 8x8, 64x1, or 1x64 
— 2x A8R10GIO0BI0 pattern color 


— the 64-bit pattern itself 


the NOTIFY DMA object - pointer to DMA object used by NOTIFY methods. NV1 only - moved to graph 
object options оп NV3+. Set by direct PGRAPH access only. 


the main РМА object - pointer to РМА object used by ТЕМ and ITM objects. МУ1 only - moved to graph 
object options on NV3+. Set by direct PGRAPH access only. 


On МУІ, framebuffer setup - set by direct PGRAPH access only: 
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- 27? 

* On МУЗ+, rendering surface setup: 
- 27? 

There are 4 copies of this state, one for each surface used by PGRAPH: 

— DST - the 2d destination surface 
— SRC - the 2d source surface [used by BLIT object only] 
— COLOR - the 3d color surface 
— ZETA - the 3d depth surface 


Note that the M2MF source/destination, ITM destination, IFM/SIFM source, and D3D texture don't count as 
surfaces - even though they may be configured to access the same data as surfaces on NV3+, they're accessed 
through the DMA circuitry, not the surface circuitry, and their setup is part of volatile state. 


Todo: beta factor size 


Todo: user clip state 


Todo: NV1 framebuffer setup 


Todo: NV3 surface setup 


Todo: figure out the extra clip stuff, etc. 


Todo: update for МУ4+ 


Graph object options 


In addition to the per-channel state, there is also per-object non-volatile state, called graph object options. This state 
is stored in the RAMHT entry for the object [NV1], or in a RAMIN structure [NV3-]. On subchannel switches and 
object binds, the PFIFO will send this state [NV1] or the pointer to this state [NV3-] to PGRAPH via method 0. On 
NV1:NVA, this state cannot be modified by any object methods and requires RAMHT/RAMIN access to change. On 
NV4+, PGRAPH can bind DMA objects on its own when requested via methods, and update the DMA object pointers 
in RAMIN. On NV5+, PGRAPH can modify most of this state when requested via methods. All NV4+ automatic 
options modification methods can be disabled by software, if so desired. 


The graph options contain the following information: 
* 2d pipeline configuration 
* 2d color and mono format 


* NOTIFY VALID flag - if set, NOTIFY method will be enabled. If unset, NOTIFY method will cause an 
interrupt. Can be used by the driver to emulate per-object DMA NOTIFY setting - this flag will be set on 
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objects whose emulated DMA NOTIFY value matches the one currently in PGRAPH context, and interrupt 
will cause a switch of the PGRAPH context value followed by a method restart. 


* SUBCONTEXT ID - a single-bit flag that can be used to emulate more than one PGRAPH context on one 
channel. When an object is bound and its SUBCONTEXT ID doesn't match PGRAPH's current SUBCON- 
TEXT ID, a context switch interrupt is raised to allow software to load an alternate context. 


Todo: NV3+ 


See nv1-pgraph for detailed format. 


Volatile state 


In addition to the non-volatile state described above, PGRAPH also has plenty of “volatile” state. This state deals 
with the currently requested operation and may be destroyed by switching to a new subchannel or binding a new 
object [though not by full channel switches - the channels are supposed to be independent after all, and kernel driver 
is supposed to save/restore all state, including volatile state]. 


Volatile state is highly object-specific, but common stuff is listed here: 


* the "notifier write pending" flag and requested notification type 


Todo: more stuff? 


Notifiers 


The notifiers are 16-byte memory structures accessed via DMA objects, used for synchronization. Notifiers are written 
by PGRAPH when certain operations are completed. Software can poll on the memory structure, waiting for it to be 
written by PGRAPH. The notifier structure is: 


base+0x0: 64-bit timestamp - written by PGRAPH with current PTIMER time as of the notifier write. The timestamp 
is a concatenation of current values of TIME LOW and TIME HIGH registers When big-endian mode is in 
effect, this becomes a 64-bit big-endian number as expected. 


base+0x8: 32-bit word always set to 0 by PGRAPH. This field may be used by software to put а non-0 value for 
software-written error-caused notifications. 


base+0xc: 32-bit word always set to 0 by PGRAPH. This is used for synchronization - the software is supposed to 
set this field to a поп-0 value before submitting the notifier write request, then wait for it to become 0. Since the 
notifier fields are written in order, it is guaranteed that the whole notifier structure has been written by the time 
this field is set to 0. 


Todo: verify big endian on non-G80 


There are two types of notifiers: ordinary notifiers [NV1-] and M2MF notifiers [NV3-]. Normal notifiers are written 
when explicitely requested by the NOTIFY method, M2MF notifiers are written on M2MF transfer completion. M2MF 
notifiers cannot be turned off, thus it's required to at least set up a notifier DMA object if M2MF is used, even if the 
software doesn't wish to use notifiers for synchronization. 


Todo: figure out NV20 mysterious warning notifiers 


2.9. PGRAPH: 2d/3d graphics and compute engine 175 


nVidia Hardware Documentation, Release git 


Todo: describe GF100- notifiers 


The notifiers are always written to the currently bound notifier DMA object. The M2MF notifiers share the DMA 
object with ordinary notifiers. The layout of the DMA object used for notifiers is fixed: 


0x00: ordinary notifier #0 
0x10: M2MF notifier [NV3-] 


0x20: 
0x30: 
0x40: 
0x50: 
0x60: 
0x70: 
0x80: 


ordinary notifier #2 
ordinary notifier #3 
ordinary notifier #4 
ordinary notifier #5 
ordinary notifier #6 
ordinary notifier #7 


ordinary notifier #8 


NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 


ті та та та та: — О, гез ИШ, == | 


] 
] 
] 
] 
] 
] 
] 
] 


0х90: 
Оха0: 
Oxb0: 
Охс0: 
Охао: 
Oxe0: 
Oxf0 


ordinary notifier #9 [NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 
NV3:NV4 only 


NV3:NV4 only 


ordinary notifier #10 


j 
ын 


ordinary notifier #11 


m— = 
шы 


ordinary notifier #12 


ын 


ordinary notifier #13 


к= 
шы 


ordinary notifier #14 


к=" 
шы 


ordinary notifier #15 


= 
шы 


Todo: 0x20 - NV20 warning notifier? 


Note that the notifiers always have to reside at the very beginning of the DMA object. On ХУ! and NV4+, this 
effectively means that only 1 notifier of each type can be used per DMA object, requiring mulitple DMA objects 
if more than one notifier per type is to be used, and likely requiring a dedicated DMA object for the notifiers. On 
NV3:NV4, up to 15 ordinary notifiers may be used in a single DMA object, though that DMA object likely still needs 
to be dedicated for notifiers, and only one of the notifiers supports interrupt generation. 


NOTIFY method 


Ordinary notifiers are requested via the NOTIFY method. Note that the NOTIFY method schedules a notifier write on 
completion of the method following the NOTIFY - NOTIFY merely sets “a notifier write is pending” state. 


It is an error if a NOTIFY method is followed by another NOTIFY method, a DMA_NOTIFY method, an object bind, 
or a subchannel switch. 


In addition to a notifier write, the NOTIFY method may also request a NOTIFY interrupt to be triggered on PGRAPH 
after the notifier write. 


mthd 0x104: NOTIFY [all NV1:GF100 graph objects] Requests a notifier write and maybe an interrupt. 
write/interrupt will be actually performed after the next method completes. Possible parameter values are: 


The 
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0: WRITE - write ordinary notifier 40 1: WRITE AND AWAKEN - write ordinary notifier 0, then 
trigger NOTIFY 


interrupt [NV3-] 


2: WRITE 2 - write ordinary notifier #2 [NV3:NV4] 3: WRITE 3 - write ordinary notifier #3 
[NV3:NV4] [...] 15: WRITE 15 - write ordinary notifier #15 [NV3:NV4] 


Operation:: 


if (cur grobj. WFOTIFY VALID) { /* DMA notify object not set, or needs to be swapped іп by sw */ 
throw(INVALID NOTIFY); 


) else if ((param > 0 & & ери == МУ1) 
Il (param > 15 && gpu >= МУЗ && ора < NV4) ll (param > 1 && gpu >= NV4)) ( 
/* XXX: what state is changed? */ throw(INVALID VALUE); 


} else if (NOTIFY PENDING) ( /* tried to do two NOTIFY methods in row // XXX: what state is changed? 
*/ throw(DOUBLE NOTIFY); 


) else ( NOTIFY PENDING = 1; NOTIFY TYPE = param; 


) 
After every method other than NOTIFY and DMA NOTIFY, the following is done: 


if (NOTIFY PENDING) { 
int idx = NOTIFY TYPE 
if (idx -- 1) 
idx - 0; 
dma write64(NOTIFY DMA, idx*0x10+0x0, PTIMER.TIME HIGH << 32 | PTIMER.TIME LOW); 
ата write32(NOTIFY DMA, 14х«0х104-0х8, 0); 
ата write32(NOTIFY DMA, іахх0х10+0хс, 0); 
if (NOTIFY TYPE -- 1) 
irq trigger (NOTIFY); 
NOTIFY PENDING = 0; 


En 


} 


if a subchannel switch or object bind is done while NOTIFY_PENDING is set, CTXSW_NOTIFY error is raised. 


NOTE: ХУ! has a 1-bit NOTIFY_PENDING field, allowing it to do notifier writes with interrupts, but lacks support 
for setting it via the NOTIFY method. This functionality thus has to be emulated by the driver if needed. 


DMA_NOTIFY method 


On NV4+, the notifier DMA object can be bound by submitting the DMA_NOTIFY method. This functionality can 
be disabled by the driver in PGRAPH settings registers if not desired. 


mthd 0x180: DMA_NOTIFY [all NV4:GF100 graph objects] Sets the notifier DMA object. When submitted 
through PFIFO, this method will undergo handle -> address translation via RAMHT. 


Operation:: 
if (DMA_METHODS_ENABLE) { /* XXX: list the validation checks */ NOTIFY DMA = param; 
} else { throw(INVALID METHOD); 


} 
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NOP method 


On NV4+ a NOP method was added to enable asking for a notifier write without having to submit an actual method 
to the object. The NOP method does nothing, but still counts as a graph object method and will thus trigger a notifier 
write/interrupt if one was previously requested. 


mthd 0x100: NOP [all NV4+ graph objects] Does nothing. 


Operation:: /* nothing */ 


Todo: figure out if this method can be disabled for NV1 compat 


2.9.2 The memory copying objects 


Contents 


* The memory copying objects 
- Introduction 
— M2MF objects 
— P2MF objects 


- Input/output setup 


— Operation 


Introduction 


Todo: write me 


M2MF objects 


Todo: write me 


P2MF objects 


Todo: write me 


Input/output setup 
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Todo: write me 


Operation 


Todo: write me 


2.9.3 2D pipeline 


Contents: 


Overview of the 2D pipeline 


Contents 


* Overview of the 2D pipeline 


Introduction 


The objects 
ж Connecting the objects - МУ1 style 


* Connecting the objects - NV5 style 


Color and monochrome formats 
ж COLOR FORMAT methods 
* Color format conversions 


* Monochrome formats 


The pipeline 
* Pipeline configuration: NVI 
* Clipping 
* Source format conversion 
* Buffer read 
ж Bitwise operation 
* Chroma key 
* The plane mask 
* Blending 
* Dithering 
* The framebuffer 


- NVI canvas 
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“МҰЗ surfaces 


: Clip rectangles 


- NVI-style operation objects 


- Unified 2d objects 


Introduction 


On nvidia GPUs, 2d operations are done by PGRAPH engine [see graph/intro.txt]. The 2d engine is rather orthogonal 
and has the following features: 


* various data sources: 


— solid color shapes (points, lines, triangles, rectangles) 


pixels uploaded directly through command stream, raw or expanded using a palette 


— text with in-memory fonts [NV3:G80] 


rectangles blitted from another area of video memory 


pixels read by DMA 


linearly and quadratically textured quads [NV1:NV3] 


color format conversions 


chroma key 


clipping rectangles 


per-pixel operations between source, destination, and pattern: 
— logic operations 
— alpha and beta blending 
- pre-multiplied alpha blending | ХУ4-| 

plane masking [NV 1:NV4] 


dithering 


data output: 
— to the framebuffer [NV1:NV3] 
— to any surface in VRAM [NV3:G84] 


— to arbirary memory [G84-] 


The objects 


The 2d engine is controlled by the user via PGRAPH objects. On NV1:G84, each piece of 2d functionality has its 
own object class - a matching set of objects needs to be used together to perform an operation. G80+ have a unified 
2d engine object that can be used to control all of the 2d pipeline in one place. 


The non-unified objects can be divided into 3 classes: 
* source objects: control the drawing operation, choose pixels to draw and their colors 


* context objects: control various pipeline settings shared by other objects 
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* operation objects: connect source and context objects together 
The source objects are: 

* POINT, LIN, LINE, TRI, RECT: drawing of solid color shapes 

* IFC, BITMAP, SIFC, INDEX, TEXTURE: drawing of pixel data from CPU 

* BLIT: copying rectangles from another area of video memory 

* IFM, SIFM: drawing pixel data from DMA 

* GDI: Drawing solid rectangles and text fonts 

* TEXLIN, TEXQUAD, TEXLINBETA, TEXQUADBETA: Drawing textured quads 
The context objects are: 

* BETA: blend factor 

* ROP: logic operation 

* CHROMA: color for chroma key 

* PLANE: color for plane mask 

* CLIP: clipping rectangle 

* PATTERN: repeating pattern image [graph/pattern.txt] 

* BETA4: pre-multiplied blend factor 

* SURF, SURF2D, SWZSURF: destination and blit source surface setup 
The operation objects are: 

* OP CLIP: clipping operation 

* OP BLEND AND: blending 

* OP ROP AND: logic operation 

* ОР СНКОМА: color key 

e OP SRCCOPY AND: source copy with 0-alpha discard 

* OP SRCCOPY: source copy 

• ОР SRCCOPY РКЕМІЛТ: pre-multiplying copy 

“ОР BLEND PREMULT: pre-multiplied blending 
The unified 2d engine objects are described below. 
The objects that, although related to 2d operations, aren't part of the usual 2d pipeline: 

* ITM: downloading framebuffer data to DMA 

* M2MF: DMA to DMA copies 

e DVD SUBPICTURE: blending of ҮСУ data 


Note that, although multiple objects of a single kind may be created, there is only one copy of pipeline state data in 
PGRAPH. There are thus two usage possibilities: 


* aliasing: all objects on a channel access common pipeline state, making it mostly useless to create several 
objects of single kind 


* swapping: the kernel driver or some other piece of software handles PGRAPH interrupts, swapping pipeline 
configurations as they're needed, and marking objects valid/not valid according to currently loaded configuration 
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Connecting the objects - NV1 style 


The objects were originally intended and designed for connecting with so-called patchcords. A patchcord is a dummy 
object that's conceptually a wire carrying some sort of data. The patchcord types are: 


* image patchcord: carries pixel color data 

* beta patchcord: carries beta blend factor data 
* zeta patchcord: carries pixel depth data 

* rop patchcord: carries logic operation data 


Each 2d object has patchcord "slots" representing its inputs and outputs. A slot is represented by an object methods. 
Objects are connected together by creating a patchcord of appropriate type and writing its handle to the input slot 
method on one object and the output slot method on the other object. For example: 


* source objects have an output image patchcord slot [BLIT also has input image slot] 
* BETA context object has an output beta slot 
* OP BLEND AND has two image input slots, one beta input slot, and one image output slot 


A valid set of objects, called a “patch” is constructed by connecting patchcords appropriately. Not all possible con- 
nections ara valid, though. Only ones that map to the actual hardware pipeline are allowed: one of the source objects 
must be at the beginning, connected via image patchcord to ОР BLEND %, OP КОР AND, or OP SRCCOPY *, 
optionally connected further through OP. CLIP and/or OP CHROMA, then finally connected to a SURF object rep- 
resenting the destination surface. Each of the OP. * objects and source objects that needs it must also be connected to 
the appropriate extra inputs, like the CLIP rectangle, PATTERN or another SURF, or CHROMA key. 


No GPU has ever supported connecting patchcords in hardware - the software must deal with all required processing 
and state swapping. However, NV4:NV20 hardware knows of the methods reserved for these purpose, and raises a 
special interrupt when they're called. The OP. *, while lacking in any useful hardware methods, are also supported on 
NV4:NV5. 


Connecting the objects - NV5 style 


A new way of connecting objects was designed for NV5 [but can be used with earlier cards via software emulation]. 
Instead of treating a patch as a freeform set of objects, the patch is centered on the source object. While context 
objects are still in use, operation objects are skipped - the set of operations to perform is specified at the source object, 
instead of being implid by the patchcord topology. The context objects are now connected directly to the source object 
by writing their handles to appropriate source object methods. The ОР CLIP and ОР CHROMA functionality is 
replaced by CLIP and CHROMA methods on the source objects: enabling clipping/color keying is done by connecting 
appropriate context object, while disabling is done by connecting a NULL object. The remaining operation objects 
are replaced by OPERATION method, which takes an enum selecting the operation to perform. 


NV5 added support for the NV5-style connections in hardware - all methods can be processed without software 
assistance as long as only one object of each type is in use [or they're allowed to alias]. If swapping is required, it's the 
responsibility of software. The new methods can be globally disabled if NV1-style connections are desired, however. 
NV5-style connections can also be implemented for older GPUs simply by handling the relevant methods in software. 


Color and monochrome formats 


Todo: write me 
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COLOR FORMAT methods 


mthd 0x300: COLOR FORMAT [NV1_CHROMA, NV1 PATTERN] [NV4-] Sets the color format using NV1 
color enum. 


Operation: 


cur grobj.COLOR FORMAT = get nvl color format (param); 


Todo: figure out this enum 


mthd 0x300: COLOR FORMAT (ХУ4 CHROMA, МУ4 PATTERN] Sets the color format using NV4 color 


enum. 


Operation: 


cur grobj.COLOR FORMAT = get nv4 color format (param); 


Todo: figure out this enum 


Color format conversions 


Todo: write me 


Monochrome formats 


Todo: write me 


mthd 0x304: MONO FORMAT [NV1 PATTERN] [NV4-] Sets the monochrome format. 


Operation: 


if (param !- LE && param !- CGA6) 
throw (INVALID. ENUM); 
cur grobj.MONO FORMAT - param; 


Todo: check 


The pipeline 


The 2d pipeline consists of the following stages, in order: 


1. Image source: one of the source objects, or one of the three source types on the unified 2d objects [SOLID, 
SIFC, or BLIT] - see documentation of the relevant object 
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2. Clipping 


3. Source color conversion 


4. One of: 


. Bitwise operation subpipeline, soncisting of: 


. Optionally, a color key operation 

. Optionally, a plane mask operation [NV1:NV4] 
. Blending operation subpipeline, consisting of: 

. Blend factor calculation 


. Blending 


5. Dithering 


6. Destination write 


. Optionally, an arbitrary bitwise operation done on the source, the destination, and the pattern. 


In addition, the pipeline may be used in RGB mode [treating colors as made of R, G, B components], or index mode 
[treating colors as 8-bit palette index]. The pipeline mode is determined automatically by the hardware based on 


source and destination formats and some configuration bits. 


The pixels are rendered to a destination buffer. Оп NV1:NVA, more than one destination buffer may be enabled at a 
time. If this is the case, the pixel operations are executed separately for each buffer. 


Pipeline configuration: NV1 


The pipeline configuration is stored in graph options and other PGRAPH registers. It cannot be changed by user-visible 
commands other than via rebinding objects. The following options are stored in the graph object: 


* the operation, one of: 


RPOP DS - RPOP(DST, SRC) 
ROP SDD - ROP(SRC, DST, DST) 
ROP DSD - ROP(DST, SRC, DST) 
КОР SSD - ROP(SRC, SRC, DST) 
ROP DDS - ROP(DST, DST, SRC) 
КОР 5р5 - ROP(SRC, DST, SRC) 
КОР DSS - ROP(DST, SRC, SRC) 
КОР 555 - ROP(SRC, SRC, SRC) 
КОР 555 ALT - ROP(SRC, SRC, SRC) 
КОР PSS - ROP(PAT, SRC, SRC) 
КОР 5Р5 - КОР(5КС, PAT, SRC) 
ROP PPS - ROP(PAT, PAT, SRC) 
КОР SSP - ROP(SRC, SRC, PAT) 
ROP PSP - ROP(PAT, SRC, PAT) 
ROP SPP - ROP(SRC, PAT, PAT) 
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- RPOP SP - ROP(SRC, РАТ) 

- КОР DSP - ROP(DST, 5КС, РАТ) 

- КОР SDP - ROP(SRC, DST, РАТ) 

- КОР DPS - ROP(DST, РАТ, SRC) 

- КОР PDS - ROP(PAT, DST, SRC) 

- КОР SPD - ROP(SRC, РАТ, DST) 

- КОР PSD - ROP(PAT, 5КС, DST) 

- SRCCOPY - SRC [no operation] 

- BLEND DS AA - BLEND(DST, SRC, SRC.ALPHA^2) [XXX check] 
- BLEND DS AB - BLEND(DST, SRC, SRC.ALPHA * BETA) 

- BLEND DS AIB - BLEND(DST, 5КС, SRC.ALPHA * (I-BETA)) 
— BLEND PS B - BLEND(PAT, SRC, BETA) 

— BLEND PS IB - BLEND(SRC, PAT, (1-ВЕТА)) 


If the operation is set to one of the BLEND * values, blending subpipeline will be active. Otherwise, the bitwise 
operation subpipeline will be active. For bitwise operation pipeline, RPOP* and ROP* will cause the bitwise 
operation stage to be enabled with the appropriate options, while the SRCCOPY setting will cause it to be 
disabled and bypassed. 


chroma enable: if this is set to 1, and the bitwise operation subpipeline is active, the color key stage will be 
enabled 


plane mask enable: if this is set to 1, and the bitwise operation subpipeline is active, the plane mask stage will 
be enabled 


* user clip enable: if set to 1, the user clip rectangle will be enabled in the clipping stage 
* destination buffer mask: selects which destination buffers will be written 
The following options are stored in other PGRAPH registers: 
* palette bypass bit: determines the value of the palette bypass bit written to the framebuffer 


* Y8 expand: determines piepline mode used with Y8 source and поп-Ү8 destination - if set, Y8 is upconverted 
to RGB and the RGB mode is used, otherwise the index mode is used 


* dither enable: if set, and several conditions are fullfilled, dithering stage will be enabled 


* software mode: if set, all drawing operations will trap without touching the framebuffer, allowing software to 
perform the operation instead 


The pipeline mode is selected as follows: 
* if blending subpipeline is used, RGB mode is selected [index blending is not supported] 
* if bitwise operation subpipeline is used: 
— if destination format is Y8, indexed mode is selected 
— if destination format is DIR5G5B5 or DIXIRI0GIOBI0: 
ж if source format is not Y8 or Y8 expand is enabled, RGB mode is selected 


ж if source format is Y8 and Y8 expand is not enabled, indexed mode is selected 
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In RGB mode, the pipeline internally uses 10-bit components. In index mode, 8-bit indices are used. 


See nv1-pgraph for more information on the configuration registers. 


Clipping 


Todo: write me 


Source format conversion 


Firstly, the source color is converted from its original format to the format used for operations. 


Todo: figure out what happens on ITM, IFM, BLIT, TEX*BETA 


On NV 1, all operations are done on ASR10GIOBIO or I8 format internally. In RGB mode, colors are converted using 
the standard color expansion formula. In index mode, the index is taken from the low 8 bits of the color. 


Src.B = get color blO0(cur grobj, color); 
src.G = get color 910 (сиг grobj, color); 
src.R = get color г10 (сиг grobj, color); 
src.A = get color a8 (cur grobj, color); 
src.I = ecolor[0:7]; 


In addition, pixels are discarded [all processing is aborted and the destination buffer is left untouched] if the alpha 
component is 0 [even in index mode]. 


if (!src.A) 
discard; 


Todo: NV3+ 


Buffer read 


In some blending and bitwise operation modes, the current contents of the destination buffer at the drawn pixel location 
may be used as an input to the 2d pipeline. 


Todo: document that and BLIT 


Bitwise operation 


Todo: write me 
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Chroma key 


Todo: write me 


The plane mask 


Todo: write me 


Blending 


Todo: write me 


Dithering 


Todo: write me 


The framebuffer 


Todo: write me 


NV1 canvas 


Todo: write me 


NV3 surfaces 


Todo: write me 


Clip rectangles 


Todo: write me 


2.9. PGRAPH: 2d/3d graphics and compute engine 


187 


nVidia Hardware Documentation, Release git 


NV1-style operation objects 


Todo: write me 


Unified 2d objects 


Todo: write me 


0100 МОР [graph/intro.xt] 0104 NOTIFY [G80 2D] [graph/intro.txt] (ХХХ: СЕІ00 methods] 0110 
WAIT FOR IDLE [graph/intro.txt] 0140 РМ TRIGGER [graph/intro.txt] 0180 DMA NOTIFY (680 2D] 
[graph/intro.txt] 0184 DMA SRC (080 2D] [XXX] 0188 DMA DST [G80 2D] [XXX] 018c DMA COND 
[G80 2D] [XXX] [XXX: 0200-02ac] 02b0 PATTERN OFFSET [graph/pattern.txt] 02b4 PATTERN SELECT 
[graph/pattern.xt] 024с 222 [GF100 2D-] [XXX] 02е0 297 [GF100 2D-] [XXX] 02e8 PAT- 
TERN COLOR FORMAT [graph/pattern.txt] 02ec PATTERN. BITMAP FORMAT [graph/pattern.txt] 02f0-i*4, 
і<2 PATTERN BITMAP COLOR [graph/pattern.txt] 02184454, i«2 PATTERN BITMAP  [graph/pattern.txt] 
0300-1454, 1<64 PATTERN Х8К868В8 [graph/pattern.txt] 0400+1*4, 1-32 PATTERN. R5G6B5 [graph/pattern.txt] 
0480-H*4, 1-32 PATTERN Х1К565В5 [graph/pattern.txt] 0500+i*4, i<16 PATTERN Ү8 [graph/pattern.txt] (ХХХ: 
0540-08dc] 08е0+1*4, 132 FIRMWARE [graph/intro.txt] [XX X: GF100 methods] 


2D pattern 


Contents 


* 2D pattern 
- Introduction 
— PATTERN objects 
— Pattern selection 
— Pattern coordinates 


- Bitmap pattern 


— Color pattern 


Introduction 


One of the configurable inputs to the bitwise operation and, on NV1:NV4, the blending operation is the pattern. А 
pattern is an infinitely repeating 8x8, 64x1, or 1x64 image. There are two types of patterns: 


* bitmap pattern: an arbitrary 2-color 8x8, 64x1, or 1x64 2-color image 
e color pattern: an aribtrary 8x8 R8G8B8 image [NV4-] 


The pattern can be set through the NV 1-style * PATTERN context objects, or through the G80-style unified 2d objects. 
For details on how and when the pattern is used, see 2D pattern. 


The graph context used for pattern storage is made of: 
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* pattern type selection: bitmap or color [NV4-] 


* bitmap pattern state: 


shape selection: 8x8, 1x64, or 64x1 

the bitmap: 2 32-bit words 

2 colors: А8К10С10В10 format [NVI:NV4] 

2 colors: 32-bit word + format selector each [NV4:G80] 
2 colors: 32-bit word each [G80-] 

color format selection [G80-] 


bitmap format selection [G80-] 


* color pattern state [NV4-]: 


64 colors: R8G8B8 format 


* pattern offset: 2 6-bit numbers [G80-] 


PATTERN objects 


The PATTERN object family deals with setting up the pattern. The objects in this family are: 
• objtype 0x06: МУ1 PATTERN [NV1:NV4] 
* class 0x0018: МУ1 PATTERN [NV4:G80] 
* class 0x0044: NV4 PATTERN [NV4:G84] 


The methods for this family are: 


0100 NOP [NV4-] [graph/intro.txt] 0104 NOTIFY [graph/intro.txt] 0110 WAIT. FOR. IDLE [G80-] [graph/intro.txt] 
0140 PM, TRIGGER [NV40-?] [XXX] [graph/intro.txt] 0180 М DMA. NOTIFY [NV4-] [graph/intro.txt] 0200 
О PATCH IMAGE OUTPUT [NV4:NV20] [see below] 0300 COLOR. FORMAT [NV4-] [see below] 0304 
ВІТМАР FORMAT [NV4-] [see below] 0308 BITMAP. SHAPE [see below] 030c TYPE [NV4. PATTERN] [see be- 
low] 0310+i*4, і<2 BITMAP. COLOR [see below] 0318+i*4, i<2 BITMAP [see below] 04004474, і<16 COLOR. ҮЗ 
[NV4. PATTERN] [see below] 05004454, i<32 COLOR. R5G6B5 [NV4. PATTERN] [see below] 06004154, i<32 
COLOR. XIR5G5B5 [NV4. PATTERN] [see below] 07004154, 164 COLOR. X8R8G8B8 [NV4. PATTERN] [see 


below] 


mthd 0x200: PATCH IMAGE OUTPUT [* PATTERN] [NV4:NV20] Reserved for plugging an image patch- 
cord to output the pattern into. 


Operation: throwUNIMPLEMENTED MTHD); 


Pattern selection 


With the * 


PATTERN objects, the pattern type is selected using the TYPE and BITMAP SHAPE methods: 


mthd 0x030c: TYPE [NV4 PATTERN] 
Sets the pattern type. One of: 1: BITMAP 2: COLOR 


Operation:: 


if (NV4:G80) | PATTERN, TYPE = param; 
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} else { SHADOW. COMP2D.PATTERN. TYPE = param; if (SHADOW. COMP2D.PATTERN. TYPE == 
COLOR) 


PATTERN SELECT - COLOR; 
else PATTERN SELECT = SHADOW COMP2D.PATTERN BITMAP SHAPE; 
} 
mthd 0х308: BITMAP 5НАРЕ [* PATTERN] 
Sets the pattern shape. One of: 0: 8x8 1: 64x1 2: 1x64 
On unified 2d objects, use the PATTERN. SELECT method instead. 
Operation:: 
if (param » 2) throwINVALID ENUM); 
if (NV1:G80) ( PATTERN. BITMAP SHAPE = param; 


} else | SHADOW COMP2D.PATTERN BITMAP SHAPE = param; if 
(SHADOW COMP2D.PATTERN TYPE == COLOR) 


PATTERN SELECT - COLOR; 
else PATTERN SELECT - SHADOW COMP2D.PATTERN BITMAP SHAPE; 


) 


With the unified 2d objects, the pattern type is selected along with the bitmap shape using the PATTERN SELECT 
method: 


та 0x02bc: PATTERN SELECT [* 2D] 


Sets the pattern type and shape. One of: 0: BITMAP 8X8 1: BITMAP 64X1 2: BITMAP 1Х64 3: 
COLOR 


Operation:: 
if (param < 4) PATTERN. SELECT = SHADOW 2р.РАТТЕКМ SELECT = param; 
else throw(INVALID ENUM); 


Pattern coordinates 


The pattern pixel is selected according to pattern coordinates: px, py. Оп NV1:G80, the pattern coordinates are equal 
to absolute Пе. not canvas-relative] coordinates in the destination surface. On G80+, an offset сап be added to the 
coordinates. The offset is set by the PATTERN. OFFSET method: 


mthd 0x02b0: PATTERN OFFSET [* 2D] Sets the pattern offset. bits 0-5: X offset bits 8-13: Y offset 
Operation: PATTERN OFFSET = param; 


The offset values are added to the destination surface X, Y coordinates to obtain px, py coordinates. 


Bitmap pattern 


The bitmap pattern is made of three parts: 
* two-color palette 


* 64 bits of pattern: each bit describes one pixel of the pattern and selects which color to use 
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* shape selector: determines whether the bitmap is 8x8, 64x1, or 1x64 


The color to use for given pattern coordinates is selected as follows: 


b6 bit; 
if (shape -- 8x8) 
bit = (py&7) << 3 | (px&7); 
else if (shape == 64х1) 
bit = px & 0х3#; 
else if (shape == 1x64) 
bit = ру & 0х3#; 


bl pixel = PATTERN_BITMAP [bit [5]] [bit [0:4]]; 
color = PATTERN_BITMAP_COLOR[pixel]; 


Оп NVI:NVA, the color is internally stored in A8R10G10B10 format and upconverted from the source format when 
submitted. On NV4:G80, it's stored in the original format it was submitted with, and is annotated with the format 
information as of the submission. On G80+, it's also stored as it was submitted, but is not annotated with format 
information - the format used to interpret it is the most recent pattern color format submitted. 


On NV1:G80, the color and bitmap formats are stored in graph options for the PATTERN object. On G80+, they're 
part of main graph state instead. 


The methods dealing with bitmap patterns are: 
та 0x300: COLOR. FORMAT [NV1 PATTERN] [NV4-] 


Sets the color format used for subsequent bitmap pattern colors. One of: 1: X16A8Y8 2: X16A1R5G5B5 
3: ABR8G8B8 


Operation:: 


switch (param) ( case 1: cur grobj.color format = X16A8Y8; break; case 2: cur grobj.color format 
= Х16А1К50585: break; case 3: сш grobj.color format = А8К808В8, break; default: 
throw(INVALID ENUM); 


) 
mthd 0x300: COLOR. FORMAT [NV4 PATTERN] 


Sets the color format used for subsequent bitmap pattern colors. One of: 1: A16R5G6B5 2: 
X16A1R5G5B5 3: A8R8G8B8 


Operation:: 
if (NVI:NVA4) { 


switch (param) | case 1: cur grobj.color format = А16К506В5, break; case 2: сиг grobj.color format 
= Х16А1К5058В5, break; case 3: сиг grobj.color format = A8R8G8B8; break; default: 
throw(INVALID ENUM); 


) 
} else | SHADOW COMP2D.PATTERN COLOR FORMAT = param; switch (param) { 


case 1: PATTERN COLOR FORMAT = А16К506В5: break; сазе 2: PAT- 
TERN. COLOR, FORMAT = X16A1R5GS5B5; break; case 3: PATTERN. COLOR. FORMAT = 
A8R8G8B8; break; default: throw(INVALID ENUM); 


) 
mthd 0x2e8: PATTERN COLOR FORMAT (680 2D] 
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Sets the color format used for bitmap pattern colors. One of: 0: AI6R5G6B5 1: XIGAIR5GSB5 2: 
A8R8G8B8 3: X16A8Y8 4: 222 [XXX] 5: 222 [XXX] 


Operation:: 
if (param < 6) PATTERN. COLOR. FORMAT = SHADOW 2D.PATTERN COLOR FORMAT = param; 
else throw(INVALID ENUM); 
mthd 0x304: BITIMAP FORMAT [* PATTERN] [NV4-] 
Sets the bitmap format used for subsequent pattern bitmaps. One of: 1: LE 2: CGA6 
Operation:: 
if (NV4:G80) ( 


switch (param) ( case 1: cur grobj.bitmap format = LE; break; case 2: cur grobj.bitmap format = 
CGA6; break; default: throw(INVALID ENUM); 


} 
} else 1 


switch (param) { case 1: PATTERN_BITMAP_FORMAT = LE; break; case 2: PAT- 
TERN BITMAP FORMAT = CGA6; break; default: throw(INVALID ENUM); 


) 
mthd 0x2ec: PATTERN BITMAP FORMAT [* PATTERN] 

Sets the bitmap format used for pattern bitmaps. One of: 0: LE 1: CGA6 
Operation:: 

if (param « 2) PATTERN. BITMAP FORMAT - param; 

else throw(INVALID ENUM); 


mthd 0х310+1*4, ї«2: BITMAP COLOR [* PATTERN] mthd Ox2f0-H*4, i<2: PATTERN BITMAP COLOR 
[* 2D] 


Sets the colors used for bitmap pattern. 1=0 sets the color used for pixels corresponding to “0” bits in the 
pattern, 1=1 sets the color used for ‘1’. 


Operation:: 


if (NV1:NV4) { PATTERN BITMAP COLOR([i].B = get color blO(cur grobj, param); PAT- 
TERN. BITMAP COLOR[i].G = get color blO(cur grobj, param); PATTERN. BITMAP COLOR[i].R 
= get color blO(cur grobj, param); PATTERN. BITMAP COLOR[i].A = get color b8(cur grobj, 
param); 


} else if (NV4:G80) | PATTERN. BITMAP COLOR[i] = param; /* XXX: details */ CON- 
TEXT FORMAT.PATTERN BITMAP COLOR[i] = cur. grobj.color. format; 


} else { PATTERN BITMAP COLOR[i] = param; 
) 

mthd 0x318+i*4, i<2: BITMAP [* PATTERN] та 0х218-474, 1<2: PATTERN. BITMAP [* 2D] 
Sets the pattern bitmap. 1-0 sets bits 0-31, 1-41 sets bits 32-63. 


Operation:: tmp = param; if (cur grobj.BITMAP FORMAT == ССА6 && ХУ1:080) { /* XXX: check if also 
NV4+ */ 
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/* pattern stored internally in LE format - for СС Аб, reverse bits in all bytes */ 


tmp = (tmp & Oxaaaaaaaa) >> 1 | (tmp & 0x55555555) << 1; tmp = (tmp & Охсссссссс) >> 2 | (tmp 
& 0x33333333) << 2; tmp = (tmp & OxfOfOfOf0) >> 4 | (tmp & OxOfOfOfOf) << 4; 


| PATTERN. BITMAP[i] = tmp; 


Color pattern 


The color pattern is always an 8x8 array of R8G8B8 colors. It is stored and uploaded ав an array of 64 cells in raster 
scan - the color for pattern coordinates (px, py) is taken from PATTERN COLOR[(py&7) << 3 | (px&7)]. There are 
4 sets of methods that set the pattern, corresponding to various color formats. Each set of methods updates the same 
state internally and converts the written values to R8G8B8 if necessary. Color pattern is available on NV4+ only. 


та 0х400-н“4, 1<16: COLOR. Y8 [NV4 PATTERN] та 0х500+1*4, і<16: PATTERN. COLOR YS8 [* 2D] 


Sets 4 color pattern cells, from Y8 source. bits 0-7: color for pattern cell 1*4+0 bits 8-15: color for pattern 
cell 154--1 bits 16-23: color for pattern cell 1%4--2 bits 24-31: color for pattern cell 154--3 


Operation: PATTERN COLOR[|4*i] = ҮЗ to R8G8B8(param[0:7]); ^ PATTERN СОГОВ(45141| = 
Y8_to_R8G8B8(param[8:15]); ^ PATTERN COLOR[4*i42] = ҮЗ to R8G8BS8(param[16:23]); PAT- 
TERN. COLOR[4*i43] = ҮЗ to R8G8B8(param[24:31]); 


та 0x500+i*4, 1<32: COLOR. R5G6B5 [NV4 PATTERN] mthd 0х400+1*4, i<32: PATTERN. COLOR R5G6B5 
[* 2D] 


Sets 2 color pattern cells, from R5G6B5 source. bits 0-15: color for pattern cell 1*2+0 bits 16-31: color 
for pattern cell 1*2+1 


Operation:: PATTERN. COLOR[2*i] = R5G6B5 to R8G8B8(param[0:15); РАТТЕКМ COLOR[2*i-1] = 
К5СӨВ5 to R8G8B8(param[16:31]); 


mthd  0x600-i*4, 1<32: COLOR XIRS5GSB5 [NV4 PATTERN] mthd 0х480+1*4, 1<32: PAT- 
TERN COLOR XIR5GS5B5 [* 2D] 


Sets 2 color pattern cells, from Х1К565В5 source. bits 0-15: color for pattern cell i*2+0 bits 16-31: 
color for pattern cell i*2+1 


Operation:: PATTERN_COLOR[2*i] = Х1В505В5 to R8G8B8(param[0:15]); PATTERN_COLOR[2*i+1] = 
X1IR5G5B5_to_R8G8B8(param[16:31]); 


mthd 0х7004454, 1<64: COLOR X8R8G8B8 [NV4 PATTERN] mthd 0х300+1*4, 1<64: PAT- 
TERN COLOR Х8К808В8 [* 2D] 


Sets a color pattern cell, from X8R8GS8BS source. 
Operation:: PATTERN. COLOR[i] = param[0:23]; 


Todo: precise upconversion formulas 


Context objects 


Contents 


* Context objects 
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- Introducton 

- BETA 

- КОР 

- СНКОМА and PLANE 
CLIP 

— BETA4 


- Surface setup 


* SURF 

* SURF2D 
* SURF3D 
ж SWZSURF 


Introducton 


Todo: write m 


BETA 


The BETA object family deals with setting the beta factor for the BLEND operation. The objects in this family are: 
e objtype 0x01: МУІ BETA [NV1:NV4] 
e class 0x0012: МУ1 BETA [NV4:G84] 

The methods are: 


0100 NOP [NV4-] 0104 NOTIFY 0110 WAIT FOR IDLE [G80-] 0140 PM TRIGGER [NV40-?] [XXX] 0180 N 
DMA NOTIFY [NV4-] 0200 О PATCH. BETA OUTPUT [NV4:NV20] 0300 BETA 


та 0x300: BETA [NV1 BETA] Sets the beta factor. The parameter is a signed fixed-point number with a sign 
bit and 31 fractional bits. Note that negative values are clamped to 0, and only 8 fractional bits are actually 
implemented in hardware. 


Operation: 


if (param & 0х80000000) /х signed « 0 х/ 
ВЕТА = 0; 


else 


ВЕТА = param & 0х7Ғ800000; 


mthd 0x200: PATCH BETA OUTPUT [NV1 BETA] [NV4:NV20] Reserved for plugging a beta patchcord to 
output beta factor into. 


Operation:: throwUNIMPLEMENTED MTHD); 
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ROP 


The ROP object family deals with setting the ROP [raster operation]. The ROP value thus set is only used in the 
КОР * operation modes. The objects in this family are: 


• objtype 0x02: МУІ КОР [NV1:NV4] 
e class 0x0043: МУІ КОР [NV4:G84] 
The methods are: 


0100 МОР [NV4-] 0104 NOTIFY 0110 WAIT FOR IDLE [G80-] 0140 РМ TRIGGER [NV40-?] [XXX] 0180 N 
DMA NOTIFY [NV4-] 0200 О PATCH КОР OUTPUT [NV4:NV20] 0300 КОР 


mthd 0x300: КОР |ХУ1 ROP] Sets the raster operation. 


Operation: 


if (param & -Oxff) 
throw (INVALID. VALUE); 
ROP - param; 


- 


та 0x200: PATCH КОР OUTPUT [NV1 КОР] [NV4:NV20] Reserved for plugging а КОР patchcord to out- 
put the ROP into. 


Operation: 


throw (UNIMPLEMENTED_MTHD) ; 


CHROMA and PLANE 


The CHROMA object family deals with setting the color for the color key. The color key is only used when enabled 
in options for a given graph object. The objects in this family are: 


• objtype 0x03: NV1_CHROMA [NV1:NV4] 
* class 0х0017: NV1_CHROMA [NV4:G80] 
* class 0x0057: NV4_CHROMA [NV4:G84] 


The PLANE object family deals with setting the color for plane masking. The plane mask operation is only done when 
enabled in options for a given graph object. The objects in this family are: 


• objtype 0x04: МУ1 PLANE (ХУ1:ХУ4| 
For both objects, colors are internally stored in AIRI0G10B10 format. [XXX: check NV4+] 
The methods for these families are: 


0100 МОР [NV4-] 0104 NOTIFY 0110 WAIT FOR IDLE [G80-] 0140 PM TRIGGER [NV40-?] [XXX] 0180 N 
DMA NOTIFY [NV4-] 0200 О PATCH IMAGE OUTPUT [NV4:NV20] 0300 COLOR. FORMAT [NV4-] 0304 
COLOR 


та 0x304: COLOR [* CHROMA, NV1 PLANE] Sets the color. 


Operation: 


struct { 
int B : 10; 
int G : 10; 
int R : 10; 


(continues on next page) 
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(continued from previous page) 


int A : 1; 
) tmp; 
tmp.B = get color blO0(cur grobj, param); 
tmp.G = get color 910 (сог grobj, param); 
tmp.R = get color г10 (сог grobj, param); 
tmp.A = get color al(cur grobj, param); 
if (cur grobj.type == NV1 PLANE) 

PLANE - tmp; 
else 

CHROMA - tmp; 


Todo: check NV3+ 


mthd 0x200: PATCH. IMAGE OUTPUT [* СНКОМА, NV1 PLANE] [NV4:NV20] Reserved for plugging an 
image patchcord to output the color into. 


Operatio 


n 


throw (UNIMPLEMENTI 


ED MTHD); 


CLIP 


The CLIP object family deals with setting up the user clip rectangle. The user clip rectangle is only used when enabled 
in options for a given graph object. The objects in this family are: 


e objtype 0x05: МУІ CLIP [NV1:NV4] 
* class 0х0019: МУ1 CLIP [NV4:G84] 


The methods for this family are: 


0100 NOP [NV4-] 0104 NOTIFY 0110 WAIT FOR IDLE [G80-] 0140 PM TRIGGER [NV40-?] [XXX] 0180 N 
DMA NOTIFY [NV4-] 0200 О PATCH IMAGE OUTPUT [NV4:NV20] 0300 CORNER 0304 SIZE 


The clip rectangle state can be loaded in two ways: 


* submit CORNER method twice, with upper-left and bottom-right corners 


* submit CORNER method with upper-right corner, then SIZE method 


To enable that, clip rectangle method operation is a bit unusual. 


Todo: check if still applies on NV3+ 


Note that the clip rectangle state is internally stored relative to the absolute top-left corner of the framebuffer, while 
coordinates used in methods are relative to top-left corner of the canvas. 


mthd 0x300: CORNER [NV1 CLIP] Sets a corner of the clipping rectangle. bits 0-15: X coordinate bits 16-31: Y 


coordinate 
Operation: 
ABS UCLIP XMIN = ABS 0СЫТР XMAX; 
ABS UCLIP YMIN = ABS 0СЫТР YMAX; 


(continues on next page) 
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(continued from previous page) 


ABS UCLIP XMAX = CANVAS, MIN.X + param.X; 
ABS UCLIP YMAX = CANVAS, MIN.Y + param.Y; 


Todo: check NV3+ 


mthd 0x304: SIZE [NV1 CLIP] Sets the size of the clipping rectangle. bits 0-15: width bits 16-31: height 


Operation: 


ABS UCLIP. XMIN = ABS UCLIP. XMAX; 
ABS UCLIP, YMIN - ABS UCLIP YMAX; 
ABS UCLIP XMAX += param.X; 
ABS UCLIP YMAX += param.Y; 


Todo: check NV3+ 


шїї 0x200: PATCH IMAGE OUTPUT [NV1 CLIP] [NV4:NV20] Reserved for plugging an image patchcord 
to output the rectangle into. 


Operation: 


throw (UNIMPLEMENTED MTHD); 


ВЕТА4 


The BETA4 object family deals with setting the per-component beta factors for the BLEND PREMULT and SRC- 
COPY PREMULT operations. The objects in this family are: 


e class 0x0072: МУ4 ВЕТА4 [NV4:G84] 
The methods are: 


0100 МОР [NV4-] 0104 NOTIFY 0110 WAIT FOR IDLE [G80-] 0140 PM TRIGGER [NV40-?] [XXX] 0180 N 
DMA NOTIFY [NV4-] 0200 О PATCH BETA OUTPUT [NV4:NV20] 0300 BETA4 


mthd 0x300: BETA4 [NV4_BETA4] Sets the per-component beta factors. bits 0-7: B bits 8-15: G bits 16-23: R bits 
24-31: A 


Operation: 


/* XXX: figure it out »/ 


mthd 0x200: PATCH BETA OUTPUT [NV4 BETA4] [NV4:NV20] Reserved for plugging a beta patchcord to 
output beta factors into. 


Operation: 


throw (UNIMPLEMENTED_MTHD) ; 


Surface setup 
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Todo: write me 


SURF 


Todo: write me 


SURF2D 


Todo: write me 


SURF3D 


Todo: write me 


SWZSURF 


Todo: write me 


2D solid shape rendering 


Contents 


* 2D solid shape rendering 

- Introduction 

— Source objects 
* Common methods 
* POINT 
ж LINE/LIN 
* TRI 
* RECT 

— Unified 2d object 


— Rasterization rules 
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* Points and rectangles 


* Lines and lins 


* Triangles 


Introduction 


One of 2d engine functions is drawing solid [single-color] primitives. The solid drawing functions use the usual 2D 
pipeline as described in graph/2d.txt and are available on all cards. The primitives supported are: 


e points [NV1:NV4 and С80+] 
* lines [NVI:NV4] 
e lins [half-open lines] 
* triangles 
* upright rectangles [edges parallel to X/Y axes] 
The 2d engine is limitted to integer vertex coordinates [ie. all primitive vertices must lie in pixel centres]. 


On NV1:G84 cards, the solid drawing functions are exposed via separate source object types for each type of primitive. 
On G80+, all solid drawing functionality is exposed via the unified 2d object. 


Source objects 


Each supported primitive type has its own source object class family on NV1:G80. These families are: 
• POINT [NV1:NV4] 
• LINE [NVI:NV4] 
* LIN [NV1:G84] 
* TRI [NV1:G84] 
e RECT [NVI:NV40] 


Common methods 


The common methods accepted by all solid source objects are: 


0100 МОР [NV4-] [graph/intro.txt] 0104 NOTIFY [graph/intro.txt] 010c PATCH [NV4:?] [graph/2d.txt] 0110 
WAIT FOR IDLE [G80-] [graph/intro.txt] 0140 PM, TRIGGER [NV40-?] [graph/intro.txt] 0180 ХРМА NOTIFY 
[NV4-] [graph/intro.txt] 0184 М МУІ CLIP [NV5-] [graph/2d.txt] 0188 М МУІ PATTERN [NV5-] ІМУІ *] 
[graph/2d.txt] 0188 М NV4 PATTERN [NV5-] (МУ4 * and up] [graph/2d.txt] 018€ М МУІ КОР [NV5-] 
[graph/2d.txt] 0190 М МУІ BETA [NV5-] [graph/2d.txt] 0194 М МУЗ SURFACE [NV5-] [ҸУТ *] [graph/2d.txt] 
0194 N МУ4 ВЕТА4 [NV5-] [NV4_* and up] [graph/2d.txt] 0198 М NV4 SURFACE [NV5-] [NV4_* and up] 
[graph/2d.txt] 02fc М OPERATION [NV5-] [graph/2d.txt] 0300 COLOR, FORMAT [NV4-] [graph/solid.txt] 0304 
COLOR [graph/solid.txt] 


Todo: PM TRIGGER? 
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Todo: PATCH? 


Todo: add the patchcord methods 


Todo: document common methods 


POINT 


The POINT object family draws single points. The objects are: 
• objtype 0x08: NVI. POINT [NV1:NV4] 
The methods are: 


0100:0400 [common solid rendering methods] 0400+1*4, 1432 POINT ХҮ 0480-4478, і<16 POINT32 X 048441*8, 
і<16 POINT32 Y 0500+i*8, i<16 CPOINT COLOR 0504+1*8, i<16 CPOINT XY 


Todo: document point methods 


LINE/LIN 


The LINE/LIN object families draw lines/lins, respectively. The objects are: 
• objtype 0x09: NVI LINE [NV1:NV4] 
e objtype 0x0a: NV1 LIN [NV1:NV4] 
e class 0х001с: МУІ LIN [NV4:NV40] 
e class 0x005c: МУ4 LIN [NV4:G80] 
* class 0x035c: NV30 LIN [NV30:NV40] 
e class 0х305с: МУЗО LIN [NV40:G84] 
The methods are: 


0100:0400 [common solid rendering methods] 0400-4758, 1<16 LINE START XY 0404+i*8, 1<16 LINE END XY 
0480-H*16, 1<8 LINE32 START X 0484-4516, 1<8 LINE32 START Y 0488-4716, i<8 LINE32 END X 
048с+1*16, 1<8 LINE32 END Y 0500+1*4, 1x32 РОГҮІЛМЕ XY 0580+i*8, i<16 POLYLINE32 X 0584-1458, i<16 
POLYLINE32 Y 0600-4158, 1<16 CPOLYLINE COLOR 0604-4478, i<16 CPOLYLINE XY 


Todo: document line methods 


TRI 


The TRI object family draws triangles. The objects are: 
e objtype 0x0b: МУІ TRI [NVI:NV4] 
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e class 0х0014: МУІ TRI [NV4:NV40] 
e class 0х0054: МУ4 TRI [NV4:G84] 
The methods are: 


0100:0400 [common solid rendering methods] 0310+j*4, j<3 TRIANGLE ХҮ 0320+j*8, ј<3 TRIANGLE32 X 
03244j*8, |<3 TRIANGLE32 Y 0400-74, і<32 TRIMESH XY 0480-1458, ї«16 TRIMESH32 X 0484-1458, 
1<16 TRIMESH32 Y 0500-4516 CTRIANGLE COLOR 0504+i*16+j*4, ј<3 CTRIANGLE XY 0580-48, i«16 
CTRIMESH COLOR 0584+1*8, 1<16 CTRIMESH XY 


Todo: document tri methods 


RECT 


The RECT object family draws upright rectangles. Another object family that can also draw solid rectangles and 
should be used instead of RECT on cards that don't have RECT is GDI [graph/nv3-gdi.txt]. The objects are: 


• objtype 0x0c: ХУ! RECT [NV1:NV3] 

* objtype 0x07: NV1_RECT [NV3:NV4] 

e class 0х001е: NVI. RECT [NV4:NV40] 

* class 0x005e: NV4_RECT [NV4:NV40] 
The methods are: 


0100:0400 [common solid rendering methods] 0400-18, i<16 КЕСТ POINT 0404+1*8, 1<16 КЕСТ SIZE 


Todo: document rect methods 


Unified 2d object 


Todo: document solid-related unified 2d object methods 


Rasterization rules 


This section describes exact rasterization rules for solids, ie. which pixels are considered to be part of a given solid. 
The common variables appearing in the pseudocodes are: 


* CLIP MIN X - the left bounduary of the final clipping rectangle. If user clipping rectangle [see graph/2d.txt] 
is enabled, this is max(UCLIP. MIN. X, CANVAS . MIN X). Otherwise, this is CANVAS. MIN X. 


* CLIP MAX X - the right bounduary of the final clipping rectangle. If user clipping rectangle is enabled, this is 
min(UCLIP MAX X, CANVAS MAX X). Otherwise, this is CANVAS MAX X. 


* CLIP MIN Y - the top bounduary of the final clipping rectangle, defined like CLIP MIN X 
* CLIP MAX Y - the bottom bounduary of the final clipping rectangle, defined like CLIP MAX X 


A pixel is considered to be inside the clipping rectangle if: 
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e CLIP MIN X <= x < CLIP MAX X and 
“СЫР MIN Y <= y < CLIP MAX Y 


Points and rectangles 


A rectangle is defined through the coordinates of its left-top corner [X, Y] and its width and height [W, H] in pixels. 
A rectangle covers pixels that have x in [X, X+W) and y in [Y, Y+H) ranges. 


void SOLID RECT(int X, int Y, int И, int Н) { 
int 1 = max(X, CLIP MIN X); 
int R = min(X+W, CLIP MAX X); 
( 
( 


int T = max(Y, CLIP MIN Y); 
int B = min(Y+H, CLIP MAX Y); 
іле ж, y; 
for (у = T; y < В; ytt) 
for (x = L; х < В; х++) 
DRAW PIXEL(x, y, SOLID. COLOR); 


A point is defined through its X, Y coordinates and is rasterized as if it was a rectangle with W=H=1. 


void SOLID POINT(int X, int Y) { 
SOLID RECT(X, Y, 1, 1); 


Lines and lins 


Lines and lins are defined through the coordinates of two endpoints [X[2], Y[2]]. They are rasterized via a variant of 
Bresenham's line algorithm, with the following characteristics: 


* rasterization proceeds in the direction of increasing x for y-major lines, and in the direction of increasing y for 
x-major lines [ie. in the direction of increasing minor component] 


* when presented with a tie in a decision whether to increase the minor coordinate or not, increase it. 
• if rasterizing a lin, the X[1], Y[1] pixel is not rasterized, but calculations are otherwise unaffected 
* pixels outside the clipping rectangle are not rasterized, but calculations are otherwise unaffected 


Equivalently, the rasterized lines/lins match those constructed via the diamond-exit rule with the following character- 
istics: 
* a pixel is rasterized if the diamond inside it intersects the line/lin, unless it’s a lin and the diamond also contains 
the second endpoint 


* pixels outside the clipping rectangle are not rasterized, but calculations are otherwise unaffected 


* pixel centres are considered to be on integer coordinates 


the following coordinates are considered to be contained in the diamond for pixel X, Y: 
— abs(x-X) + abs(x-Y) « 0.5 Пе. the inside of the diamond] 
— x = X-0.5, y = Y [ie. top vertex of the diamond] 
- x X, у= Y-0.5 Пе. leftmost vertex of the diamond] 


[note that the edges don't matter, other than at the vertices - it's impossible to create a line touching them without 
intersecting them, due to integer endpoint coordinates] 
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void SOLID LINE LIN(int X[2], int Y[2], int is lin) { 
/* determine minor/major direction ж/ 
int xmajor = abs(X[0] - X[1]) > abs(Y[0] - Y[1]); 
int minO, minl, maj0, majl; 
if (xmajor) { 
па10 = 


X[0]; 
majl = X[1]; 
тіп0 = Y : 

minl = Y 
} else { 
maj0 = Y 
majl = Y[1 
minO = X[0]; 
minl = X 


if (minl < min0) { 
/ж order by increasing minor х/ 
swap (min0, minl); 
swap (maj0, majl); 

} 

/ж deltas х/ 


int 4010 = minl - min0; 

int апа) = abs(majl - maj0); 

/* major step direction ж/ 

int step = majl > maj0 ? 1 : -1; 

int min, maj; 

/* scaled error real error is err/(dmin х апа) ж 2) х/ 

int err = 0; 

for (min = піп0, maj maj0; maj != majl + step; maj += step) ( 


if (err >= dmaj) { /* error >= 1/(а4тіпж«2) х/ 

/* error too large, increase minor »/ 

min--t; 

err -= апа) ж 2; /ж error -= 1/dmin x/ 
} 
int x = xmajor?maj:min; 
int y - xmajor?min:maj; 
/* if not the final pixel of a lin and inside the clipping 

region, draw it х/ 
if ((!is lin || x !- X[1] || y != Y[1]) && in clip(x, y)) 

DRAW PIXEL(x, y, 50110 COLOR); 
error += dmin ж 2; /ж error += 1/dmaj */ 


Triangles 


Triangles are defined through the coordinates of three vertices [X[3], Y[3]]. A triangle is rasterized as an intersection 
of three half-planes, corresponding to the three edges. For the purpose of triangle rasterization, half-planes are defined 


as follows: 
* the edges are (О, 1), (1, 2) and (2, 0) 


* if the two vertices making an edge overlap, the triangle is degenerate and is not rasterized 


* a pixel is considered to be in a half-plane corresponding to a given edge if it's on the same side of that edge as 


the third vertex of the triangle [the one not included in the edge] 
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* if the third vertex lies on the edge, the triangle is degenerate and will not be rasterized 


* if the pixel being considered for rasterization lies on the edge, it's considered included in the half-plane if the 
pixel immediately to its right is included in the half-plane 


* if that pixel also lies on the edge [ie. edge is exactly horizontal], the original pixel is instead considered included 
if the pixel immediately below it is included in the half-plane 


Equivalently, a triangle will include exactly-horizontal top edges and left edges, but not exactly-horizontal bottom 
edges nor right edges. 


void SOLID TRI(int X[3], int Ү(31) 4 


int cross = (X[1] - XI0]) * (¥[2] - Y[0]) - (Х[2] - Х[0]) * (Ү[1] - Y[01); 
if (cross == 0) /« degenerate triangle »/ 
return; 


/* coordinates іп CW order */ 
if (cross « 0) ( 
swap(X[1], X[2]); 
swap(Y[1], Y[21); 
} 
int x, y, е; 
for (y = CLIP MIN Y; y < CLIP МАХ Y; y++) 
for (х = CLIP MIN X; x < CLIP MAX X; x44) { 


for (e = 0; < 3; езе) 1 
int х0 = X[e]; 
int yO = Y[e]; 
int х1 = X[(e+1)%3]; 


int yl = Y[(e+1)%3]; 

/* first attempt ж/ 

cross = (xl = х0) = (y - yO) = (x = x0) х (yl = yO); 
/ж second attempt - pixel to the right */ 

if (cross == 0) 


cross = (хі - x0) х (у - yO) - (x + 1- х0) * (yl - y0); 
/ж third attempt - pixel below х/ 
if (cross == 0) 

cross = (xl - x0) х (y + 1- yO) - (х - х0) х (yl - yO); 


if (cross < 0) 
qoto out; 
} 
DRAW PIXEL(x, y, SOLID COLOR); 


out: 


2D image from CPU upload 


Contents 


* 2D image from CPU upload 


- Introduction 
- IFC 

— BITMAP 

- SIFC 
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- INDEX 


- TEXTURE 


Introduction 


Todo: write me 


IFC 


Todo: write me 


BITMAP 


Todo: write me 


SIFC 


Todo: write me 


INDEX 


Todo: write me 


TEXTURE 


Todo: write me 


BLIT object 


Contents 


* BLIT object 
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— Introduction 


— Methods 


— Operation 


Introduction 


Todo: write me 


Methods 


Todo: write me 


Operation 


Todo: write me 


Image to/from memory objects 


Contents 


* Image to/from memory objects 
- Introduction 
— Methods 


— IFM operation 


— ITM operation 


Introduction 


Todo: write me 


Methods 
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Todo: write me 


IFM operation 


Todo: write me 


ITM operation 


Todo: write me 


NV1 textured quad objects 


Contents 


• NV] textured quad objects 
- Introduction 


The methods 


— Linear interpolation process 


- Quadratic interpolation process 


Introduction 


Todo: write me 


The methods 


Todo: write me 


Linear interpolation process 


Todo: write me 
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Quadratic interpolation process 


Todo: write me 


GDI objects 


Contents 


* GDI objects 
- Introduction 


— Methods 


Clipped rectangles 


Unclipped rectangles 


Unclipped transparent bitmaps 


Clipped transparent bitmaps 


Clipped two-color bitmaps 


Introduction 


Todo: write me 


Methods 


Todo: write me 


Clipped rectangles 


Todo: write me 


Unclipped rectangles 


Todo: write me 
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Unclipped transparent bitmaps 


Todo: write me 


Clipped transparent bitmaps 


Todo: write me 


Clipped two-color bitmaps 


Todo: write me 


Scaled image from memory object 


Contents 


* Scaled image from memory object 
- Introduction 


— Methods 


— Operation 


Introduction 


Todo: write me 


Methods 


Todo: write me 


Operation 


Todo: write me 
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YCbCr blending objects 


Contents 


e YCbCr blending objects 
- Introduction 


— Methods 


— Operation 


Introduction 


Todo: write me 


Methods 


Todo: write me 


Operation 


Todo: write me 


2.9.4 NV1 graphics engine 


Contents: 


2.9.5 NV3 graphics engine 


Contents: 


NV3 3D objects 


Contents 


* NV3 3D objects 


— Introduction 
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Todo: write me 


Introduction 


Todo: write me 


2.9.6 МУ4 graphics engine 


Contents: 


NV4 3D objects 


Contents 


* NV4 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.9.7 NV10 Celsius graphics engine 


Contents: 


NV10 Celsius 3D objects 


Contents 


* NVIO Celsius 3D objects 


— Introduction 


Todo: write me 
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Introduction 


Todo: write me 


2.9.8 NV20 Kelvin graphics engine 


Contents: 


NV20 Kelvin 3D objects 


Contents 


* NV20 Kelvin 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.9.9 NV30 Rankine graphics engine 


Contents: 


NV30 Rankine 3D objects 


Contents 


* NV30 Rankine 3D objects 


— Introduction 


Todo: write me 
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Introduction 


Todo: write me 


2.9.10 МУ40 Curie graphics engine 


Contents: 


NV40 Curie 3D objects 


Contents 


* NV40 Curie 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.9.11 G80 Tesla graphics and compute engine 


Contents: 


G80 PGRAPH context switching 


Contents 


* 080 PGRAPH context switching 


— Introduction 


Introduction 


Todo: write me 
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G80 Tesla 3D objects 


Contents 


* G80 Tesla 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


G80 Tesla compute objects 


Contents 


* G80 Tesla compute objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


Tesla CUDA processors 


Contents: 


Tesla CUDA ISA 


Contents 


* Tesla CUDA ISA 


— Introduction 
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* Variants 

* Warps and thread types 

* Registers 

ж Memory 

* Other execution state and resources 
— Instruction format 

* Other fields 

* Predicates 

* $c destination field 

* Memory addressing 

* Shared memory access 


* Destination fields 


* 


Short source fields 


* 


Long source fields 


* 
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Introduction 


This file deals with description of Tesla CUDA instruction set. CUDA stands for Completely Unified Device Archi- 
tecture and refers to the fact that all types of shaders (vertex, geometry, fragment, and compute) use nearly the same 
ISA and execute on the same processors (called streaming multiprocessors). 


The Tesla CUDA ISA is used on Tesla generation GPUs (G8x, G9x, (200, GT21x, MCP77, MCP79, MCP89). Older 
GPUs have separate IS As for vertex and fragment programs. Newer GPUs use Fermi, Kepler2, or Maxwell ISAs. 


Variants 


There are seversal variants of Tesla ISA (and the corresponding multiprocessors). The features added to the ISA after 
the first iteration are: 


breakpoints [G84:] 

new barriers [G84:] 

atomic operations on g[] space [G84:] 

load from s[] instruction [G84:] 

lockable s[] memory [G200:] 
double-precision floating point [G200 only] 
64-bit atomic add on g[] space [G200:] 
vote instructions [G200:] 


D3D10.1 additions [GT215:]: - $sampleid register (for sample shading) - texprep cube instruction (for cubemap 
array access) - texquerylod instruction - texgather instruction 
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preret and indirect bra instructions [GT215:]? 


Todo: check variants for preret/indirect bra 


Warps and thread types 


Programs on Tesla MPs are executed in units called “warps”. A warp is a group of 32 individual threads executed 
together. All threads in a warp share common instruction pointer, and always execute the same instruction, but have 
otherwise independent state (ie. separate register sets). This doesn't preclude independent branching: when threads in 
a warp disagree on a branch condition, one direction is taken and the other is pushed onto a stack for further processing. 
Each of the divergent execution paths is tagged with a "thread mask": a bitmask of threads in the warp that satisfied 
(or not) the branch condition, and hence should be executed. The MP does no work (and modifies no state) for threads 
not covered by the current thread mask. Once the first path reaches completion, the stack is popped, restoring target 
PC and thread mask for the second path, and execution continues. 


Depending on warp type, the threads in a warp may be related to each other or not. There are 4 warp types, corre- 
sponding to 4 program types: 


vertex programs: executed once for each vertex submitted to the 3d pipeline. They're grouped into warps in a 
rather uninteresting way. Each thread has read-only access to its vertex' input attributes and write-only access 
to its vertex' output attributes. 


geometry programs: if enabled, executed once for each geometry primitive submitted to the 3d pipeline. Also 
grouped into warps in an uninteresting way. Each thread has read-only access to input attributes of its primitive's 
vertices and per-primitive attributes. Each thread also has write-only access to output vertex attributes and 
instructions to emit a vertex and break the output primitive. 


fragment programs: executed once for each fragment rendered by the 3d pipeline. Always dispatched in groups 
of 4, called quads, corresponding to aligned 2x2 squares on the screen (if some of the fragments in the square 
are not being rendered, the fragment program is run on them anyway, and its result discarded). This grouping 
is done so that approximate screen-space derivatives of all intermediate results can be computed by exchanging 
data with other threads in the quad. The quads are then grouped into warps in an uninteresting way. Each thread 
has read-only access to interpolated attribute data and is expected to return the pixel data to be written to the 
render output surface. 


compute programs: dispatched in units called blocks. Blocks are submitted manually by the user, alone or in 
so-called grids (basically big 2d arrays of blocks with identical parameters). The user also determines how many 
threads are in a block. The threads of a block are sequentially grouped into warps. All warps of a block execute 
in parallel on a single MP, and have access to so-called shared memory. Shared memory is a fast per-block area 
of memory, and its size is selected by the user as part of block configuration. Compute warps also have random 
R/W access to so-called global memory areas, which can be arbitrarily mapped to card VM by the user. 


Registers 


The registers in Tesla ISA are: 


up to 128 32-bit GPRs per thread: $r0-$r127. These registers are used for all calculations (with the exception of 
some address calculations), whether integer or floating-point. 


The amount of available GPRs per thread is chosen by the user as part of MP configuration, and can be selected 
per program type. For example, if the user enables 16 registers, $r0-$r15 will be usable and $r16-$r127 will 
be forced to 0. Since the MP has a rather limitted amount of storage for GPRs, this configuration parameter 
determines how many active warps will fit simultanously on an MP. 
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If a 16-bit operation is to be performed, each GPR from $r0-$r63 range can be treated as a pair of 16-bit registers: 
$rXI (low half of $rX) and $rXh (high part of $rX). 


If a 64-bit operation is to be performed, any naturally aligned pair of GPRs can be treated as a 64-bit register: 
$rXd (which has the low half in $rX and the high half in $r(X--1), and X has to even). Likewise, if a 128-bit 
operation is to be performed, any naturally aligned group of 4 registers can be treated as a 128-bit registers: 
$rXq. The 32-bit chunks are assigned to $rX..(X+3) in order from lowest to highest. 


* 4 16-bit address registers per thread: $al-$a4, and one additional register per warp ($a7). These registers are 
used for addressing all memory spaces except global memory (which uses 32-bit addressing via $r register file). 
In addition to the 4 per-thread registers and 1 per-warp register, there's also $a0, which is always equal to 0. 


Todo: wtf is up with 5472 


* 4 4-bit condition code registers per thread: $c0-$c3. These registers can be optionally set as a result of some 
(mostly arithmetic) instructions and are made of 4 individual bits: 


- bit 0: Z - zero flag. For integer operations, set when the result is equal to 0. For floating-point operations, 
set when the result is 0 or NaN. 


— bit 1: S - sign flag. For integer operations, set when the high bit of the result is equal to 1. For floating-point 
operations, set when the result is negative or NaN. 


- bit 2: C - carry flag. For integer addition, set when there is a carry out of the highest bit of the result. 


- bit 3: O - overflow flag. For integer addition, set when the true (infinite-precision) result doesn't fit in the 
destination (considered to be a signed number). 


* A few read-only 32-bit special registers, 5810-5818: 
- $sr0 aka $physid: when read, returns the physical location of the current thread on the GPU: 
* bits 0-7: thread index (inside a warp) 
ж bits 8-15: warp index (on an MP) 
* bits 16-19: MP index (on a TPC) 
* bits 20-23: TPC index 


— $srl aka $clock: when read, returns the MP clock tick counter. 


Todo: a bit more detail? 


- $sr2: always 0? 


Todo: perhaps we missed something? 


- $sr3 aka $vstride: attribute stride, determines the spacing between subsequent attributes of a single vertex 
in the input space. Useful only in geometry programs. 


Todo: seems to always be 0x20. Is it really that boring, or does MP switch to a smaller/bigger stride 
sometimes? 


- $sr4-$sr7 aka $pm0-$pm3: MP performance counters. 
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— $sr8 aka $sampleid [GT215:]: the sample ID. Useful only in fragment programs when sample shading is 
enabled. 


Memory 


The memory spaces in Tesla ISA are: 


СП: code space. 24-bit, byte-oriented addressing. The only way to access this space is by executing code from 
it (there's no "read from code space" instruction). There is one code space for each program type, and it's 
mapped to a 16MB range of VM space by the user. It has three levels of cache (global, TPC, MP) that need to 
be manually flushed when its contents are modified by the user. 


cO[]-c15[]: const spaces. 16-bit byte-oriented addressing. Read-only and accessible from any program type in 
8, 16, and 32-bit units. Like C[], it has three levels of cache. Each of the 16 const spaces of each program type 
can be independently bound to one of 128 global (per channel) const buffers. In turn, each of the const buffers 
can be independently bound to a range of VM space (with length divisible by 256) or disabled by the user. 


10: local space. 16-bit, byte-oriented addressing. Read-write and per-thread, accessible from any program type 
in 8, 16, 32, 64, and 128-bit units. It's directly mapped to VM space (although with heavy address mangling), 
and hence slow. Its per-thread length can be set to any power of two size between 0х10 and 0x10000 bytes, or 
to 0. 


АП: attribute space. 16-bit byte-oriented addressing. Read-only, per-thread, accessible in 32-bit units only 
and only available in vertex and geometry programs. In vertex programs, contains input vertex attributes. In 
geometry programs, contains pointers to vertices in p[] space and per-primitive attributes. 


p[]: primitive space. 16-bit byte oriented addressing. Read-only, per-MP, available only from geometry pro- 
grams, accessed in 32-bit units. Contains input vertex attributes. 


o[]: output space. 16-bit byte-oriented addressing. Write-only, per-thread. Available only from vertex and 
geometry programs, accessed in 32-bit units. Contains output vertex attributes. 


v[]: varying space. 16-bit byte-oriented addressing. Read-only, available only from fragment programs, ac- 
cessed in 32-bit units. Contains interpolated input vertex attributs. It's a “virtual” construct: there are really 
three words stored in MP for each v[] word (base, dx, dy) and reading from v[] space will calculate the value 
for the current fragment by evaluating the corresponding linear function. 


s[]: shared space. 16-bit byte-oriented addressing. Read-write, per-block, available only from compute pro- 
grams, accessible in 8, 16, and 32-bit units. Length per block can be selected by user in 0x40-byte increments 
from 0 to 0x4000 bytes. On G200+, has a locked access feature: every warp can have one locked location in 
s[], and all other warps will block when trying to access this location. Load with lock and store with unlock 
instructions can thus be used to implement atomic operations. 


gO[]-g15[]: global spaces. 32-bit byte-oriented addressing. Read-write, available only from compute programs, 
accessible in 8, 16, 32, 64, and 128-bit units. Each global space can be configured in either linear or 2d mode. 
When in linear mode, a global space is simply mapped to a range of VM memory. When in 2d mode, low 16 
bits of gX[] address are the x coordinate, and high 16 bits are the y coordinate. The global space is then mapped 
to a blocklinear 2d surface in VM space. On G84+, some atomic operations on global spaces are supported. 


Todo: 


when no-one's looking, rename the a[], p[]. v[] spaces to something sane. 


Other execution state and resources 


There's also a fair bit of implicit state stored per-warp for control flow: 
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22-bit PC (24-bit address with low 2 bits forced to 0): the current address in C[] space where instructions are 
executed. 


32-bit active thread mask: selects which threads are executed and which are not. If a bit is 1 here, instructions 
will be executed for the given thread. 


32-bit invisible thread mask: useful only in fragment programs. If a bit is 1 here, the given thread is unused, 
or corresponds to a pixel on the screen which won't be rendered (ie. was just launched to fill a quad). Texture 
instructions with “live” flag set won't be run for such threads. 


32*2-bit thread state: stores state of each thread: 
— 0: active or branched off 
— 1: executed the brk instruction 
— 2: executed the ret instruction 


— 3: executed the exit instruction 


Control flow stack. The stack is made of 64-bit entries, with the following fields: 
- PC 
— thread mask 
— entry type: 
* ]: branch 
* 2: call 
* 3: call with limit 
* 4: prebreak 
* 5: quadon 


* 6: joinat 


Todo: discard mask should be somewhere too? 


Todo: call limit counter 


Other resources available to CUDA code are: 
* $t0-$t129: up to 130 textures per 3d program type, up to 128 for compute programs. 


* $s0-$s17: up to 18 texture samplers per 3d program type, up to 16 for compute programs. Only used if linked 
texture samplers are disabled. 


* Up to 16 barriers. Per-block and available in compute programs only. А barrier is basically a warp counter: a 
barrier can be increased or waited for. When a warp increases a barrier, its value is increased by 1. If a barrier 
would be increased to a value equal to a given warp count, it’s set to 0 instead. When a barrier is waited for by 
a warp, the warp is blocked until the barrier's value is equal to 0. 


Todo: there's some weirdness in barriers. 
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Instruction format 


Instructions are stored in СП space as 32-bit little-endian words. There are short (1 word) and long (2 words) instruc- 
tions. The instruction type can be distinguished as follows: 


wordO | мога 1 | instruction type 

bits 0-1 | bits O-1 

0 - short normal 

1 0 long normal 

1 1 long normal with join 
1 2 long normal with exit 
1 3 long immediate 

2 - short control 

3 any long control 


Todo: you sure of control instructions with поп-0 w1b0-1? 


Long instructions can only be stored on addresses divisible by 8 bytes (ie. on even word address). In other words, 
short instructions usually have to be issued in pairs (the only exception is when a block starts with a short instruction 
on an odd word address). This is not a problem, as all short instructions have a long equivalent. Attempting to execute 
a non-aligned long instruction results in UNALIGNED LONG. INSTRUCTION decode error. 


Long normal instructions can have a join or exit instruction tacked on. In this case, the extra instruction is executed 
together with the main instruction. 


The instruction group is determined by the opcode fields: 
* word 0 bits 28-31: primary opcode field 
* word 1 bits 29-31: secondary opcode field (long instructions only) 
Note that only long immediate and long control instructions always have the secondary opcode equal to 0. 


The exact instruction of an instruction group is determined by group-specific encoding. Attempting to execute an 
instruction whose primary/secondary opcode doesn't map to a valid instruction group results in ILLEGAL OPCODE 
decode error. 


Other fields 


Other fields used in instructions are quite instruction-specific. However, some common bitfields exist. For short 
normal instructions, these are: 


bits 0-1: O (select short normal instruction) 


bits 2-7: destination 


bit 8: modifier 1 


bits 9-14: source 1 
bit 15: modifier 2 
bits 16-21: source 2 
bit 22: modifier 3 


bit 23: source 2 type 
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* bit 24: source 1 type 

* bit 25: $a postincrement flag 

* bits 26-27: address register 

* bits 28-31: primary opcode 

For long immediate instructions: 

* word 0: 
— bits 0-1: 1 (select long non-control instruction) 
— bits 2-7: destination 
— bit 8: modifier 1 
— bits 9-14: source 1 


bit 15: modifier 2 


— bits 16-21: immediate low 6 bits 


bit 22: modifier 3 


— bit 23: unused 

— bit 24: source 1 type 

— bit 25: $a postincrement flag 
— bits 26-27: address register 
— bits 28-31: primary opcode 


* word 1: 


bits 0-1: 3 (select long immediate instruction) 
— bits 2-27: immediate high 26 bits 
— bit 28: unused 
— bits 29-31: always 0 

For long normal instructions: 


* word 0: 


bits 0-1: 1 (select long non-control instruction) 
— bits 2-8: destination 
— bits 9-15: source 1 


bits 16-22: source 2 


— bit 23: source 2 type 
— bit 24: source 3 type 


— bit 25: $a postincrement flag 


bits 26-27: address register low 2 bits 
— bits 28-31: primary opcode 
* word 1: 


— bits 0-1: 0 (no extra instruction), 1 (join) or2 (exit) 
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— bit 2: address register high bit 
— bit 3: destination type 
— bits 4-5: destination $c register 


— bit 6: $c write enable 


bits 7-11: predicate 
— bits 12-13: source $c register 


bits 14-20: source 3 


— bit 21: source 1 type 

— bits 22-25: c[] space index 
bit 26: modifier 1 

bit 27: modifier 2 


— bit 28: unused 
— bits 29-31: secondary opcode 


Note that short and long immediate instructions have 6-bit source/destination fields, while long normal instructions 
have 7-bit ones. This means only half the registers сап be accessed in such instructions ($r0-$r63, $r01-$131h). 


For long control instructions: 
* word 0: 
— bits 0-І: 3 (select long control instruction) 
— bits 9-24: code address low 18 bits 
— bits 28-31: primary opcode 
* word 1: 


— bit 6: modifier 1 


bits 7-11: predicate 
— bits 12-13: source $c register 


bits 14-19: code address high 6 bits 


Todo: what about other bits? ignored or must be 0? 


Note that many other bitfields can be in use, depending on instruction. These are just the most common ones. 


Whenever a half-register ($rX1 or $rXh) is stored in a field, bit 0 of that field selects high or low part (0 is low, 1 is 
high), and bits 1 and up select $r index. Whenever a double register ($rXd) is stored in a field, the index of the low 
word register is stored. If the value stored is not divisible by 2, the instruction is illegal. Likewise, for quad registers 
($гХа), the lowest word register is stored, and the index has to be divisible by 4. 


Predicates 


Most long normal and long control instructions can be predicated. A predicated instruction is only executed if a 
condition, computed based on a selected $c register, evaluates to 1. The instruction fields involved in predicates are: 


* word 1 bits 7-11: predicate field - selects a boolean function of the $c register 
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* word 1 bits 12-13: $c source field - selects the $c register to use 


The predicates are: 


encoding | name description condition formula 
0x00 never always false 0 

0x01 1 less than (S & -7)^O 
0x02 e equal Z&-S 

0x03 le less than or equal S^(ZIO) 
0x04 g greater than ~Z &~(S ^0) 
0x05 14 less or greater than -2 

0х06 де greater than or equal -(S^O) 
0x07 lge ordered -21-5 
0х08 u unordered 7465 

0х09 lu less than or unordered S^O 

0x0a eu equal or unordered Z 

0x0b leu not greater than ZI(S^O) 
0х0с gu greater than or unordered -5 ^ (210) 
0х04 lgu not equal to ~ZIS 

0x0e geu not less than (SIZ ^O 
0x0 f always | always true 1 

0x10 о overflow О 

0х11 e carry / unsigned not below | C 

0x12 a unsigned above ~Z&C 
0x13 S sign / negative S 

Ох1с ns not sign / positive -5 

0х14 па unsigned not above Z|~C 

Oxle пс not carry / unsigned below | -C 

Ox1f no no overflow -О 


Some instructions read $c registers directly. The operand CSRC refers to the $c register selected by the $c source field. 
Note that, on such instructions, the $c register used for predicating is necessarily the same as the input register. Thus, 
one must generally avoid predicating instructions with $c input. 


$c destination field 


Most normal long instructions can optionally write status information about their result to a $c register. The $c 
destination is selected by $c destination field, located in word 1 bits 4-5, and $c destination enable field, located іп 
word 1 bit 6. The operands using these fields are: 


* FCDST (forced condition destination): $c0-$c3, as selected by $c destination field. 
* CDST (condition destination): 
— if $c destination enable field is 0, no destination is used (condition output is discarded). 


— if $c destination enable field is 1, same as FCDST. 


Memory addressing 


Some instructions can access one of the memory spaces available to CUDA code. There are two kinds of such 
instructions: 
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* Ordinary instructions that happen to be used with memory operands. They have very limitted direct address- 
ing range (since they fit the address in 6 or 7 bits normally used for register selection) and may lack indirect 
addressing capabilities. 


* Dedicated load/store instructions. They have full 16-bit direct addressing range and have indirect addressing 
capabilities. 


The following instruction fields are involved in memory addressing: 
* word 0 bit 25: autoincrement flag 
* word 0 bits 26-27: $a low field 
* word 1 bit 2: $a high field 
* word 0 bits 9-16: long offset field (used for dedicated load/store instructions) 
There are two operands used in memory addressing: 
* SASRC (short address source): $a0-$a3, as selected by $a low field. 
* LASRC (long address source): $a0-$a7, as selected by concatenation of $a low and high fields. 


Every memory operand has an associated offset field and multiplication factor (a constant, usually equal to the access 
size). Memory operands also come in two kinds: direct (no $a field) and indirect ($a field used). 


For direct operands, the memory address used is simply the value of the offset field times the multiplication factor. 
For indirect operands, the memory address used depends on the value of the autoincrement flag: 


* if flag is 0, memory address usedis $aX + offset х factor, where $a register is selected by SASRC (for 
short and long immediate instructions) or LASRC (for long normal instructions) operand. Note that using $a0 
with this addressing mode can emulate a direct operand. 


* if flag is 1, memory address used is simply $aX, but after the memory access is done, the $aX will be in- 
creased by offset х factor. Attempting to use $a0 (or $a5/a6) with this addressing mode results in 
ILLEGAL. POSTINCR decode error. 


Todo: figure out where and how $a7 can be used. Seems to be a decode error more often than not... 


Todo: what address field is used in long control instructions? 


Shared memory access 


Most instructions can use an s[] memory access as the first source operand. When s[] access is used, it can be used in 
one of 4 modes: 


* 0: u8 - read a byte with zero extension, multiplication factor is 1 
e 1: u16 - read a half-word with zero extension, factor is 2 

* 2: s16 - read a half-word with sign extension, factor is 2 

е 3: 532 - read a word, factor is 4 


The corresponding source 1 field is split into two subfields. The high 2 bits select s[] access mode, while the low 4 or 
5 bits select the offset. Shared memory operands are always indirect operands. The operands are: 


* 555ВС1 (short shared word source 1): use short source 1 field, all modes valid. 
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e LSSRC1 (long shared word source 1): use long source | field, all modes valid. 


e SSHSRC1 (short shared halfword source 1): use short source 1 field, valid modes 118, 116, 516. 


LSHSRC1 (long shared halfword source 1): use long source 1 field, valid modes 118, 1116, 516. 


* SSUHSRC1 (short shared unsigned halfword source 1): use short source 1 field, valid modes 118, u16. 


LSUHSRC1 (long shared unsigned halfword source 1): use long source 1 field, valid modes 118, 116. 
* SSSHSRC1 (short shared signed halfword source 1): use short source 1 field, valid modes u8, 516. 
e LSSHSRC1 (long shared signed halfword source 1): use long source | field, valid modes 118, 516. 

e LSBSRC1 (long shared byte source 1): use long source | field, опу 118 mode valid. 


Attempting to use b32 mode when it's not valid (because source 1 has 16-bit width) results in ILLE- 
GAL MEMORY SIZE decode error. Attempting to use u16/s16 mode that is invalid because the sign is wrong 
results in ILLEGAL. MEMORY. SIGN decode error. Attempting to use mode other than u8 for cvt instruction with 
18 source results in ILLEGAL. MEMORY BYTE decode error. 


Destination fields 


Most short and long immediate instructions use the short destination field for selecting instruction destination. The 
field is located in word 0 bits 2-7. There are two common operands using that field: 


* SDST (short word destination): GPR $r0-$r63, as selected by the short destination field. 
* SHDST (short halfword destination): GPR half $r01-$r3 1h, as selected by the short destination field. 


Most normal long instructions use the long destination field for selecting instruction destination. The field is located 
in word 0 bits 2-8. This field is usually used together with destination type field, located in word 1 bit 3. The common 
operands using these fields are: 


* LRDST (long register word destination): GPR $r0-$r127, as selected by the long destination field. 


* LRHDST (long register halfword destination): GPR half $r01-$r63h, as selected by the long destination field. 


* LDST (long word destination): 
— if destination type field is 0, same as LRDST. 


— if destination type field is 1, and long destination field is equal to 127, no destination is used (ie. operation 
result is discarded). This is used on instructions that are executed only for their $c output. 


— if destination type field is 1, and long destination field is not equal to 127, o[] space is written, as a direct 
memory operand with long destination field as the offset field and multiplier factor 4. 


Todo: verify the 127 special treatment part and direct addressing 


* LHDST (long halfword destination): 
— if destination type field is 0, same as LRHDST. 


— if destination type field is 1, and long destination field is equal to 127, no destination is used (ie. operation 
result is discarded). 


— if destination type field is 1, and long destination field is not equal to 127, o[] space is written, as a direct 
memory operand with long destination field as the offset field and multiplier factor 2. Since o[] can only 
be written with 32-bit accesses, the address is rounded down to a multiple of 4, and the 16-bit result is 
duplicated in both low and high half of the 32-bit value written in o[] space. This makes it pretty much 
useless. 
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• 11051 


* LODS] 


Short source fields 


Г (long quad destination): GPR quad $r0q-$r124q, as selected by the long destination field. 


Г (long double destination): GPR pair $r0d-$r126d, as selected by the long destination field. 


Todo: write me 


Long source fields 


Todo: write me 


Opcode map 
Table 11: Opcode map 
Pri- | short long | long long long long long long long long short long 
mary); nor- im- nor- nor- nor- nor- nor- nor- nor- nor- con-, con- 
op- | mal | me- | mal, mal, mal, mal, mal, mal, mal, mal, trol | trol 
code di- | sec- Sec- Sec- Sec- Sec- Sec- Sec- Sec- 
ate | ondary | ondary | ondary | ondary | ondary | ondary | ondary | ondary 
0 1 2 3 4 5 6 7 
0x0 | - - ld a[] mov mov mov st o[] mov to | shl to | sts[] - dis- 
from from from $c $a card 
$c $a $sr 
0х1 | mov | mov | mov ld c[] ld s[] vote - - - - - bra 
0х2 | add/suladd/suladd/sub | - - - - - - - - call 
0х3 | add/subdd/suludd/sub | - - set max min shl shr - ret 
0х4 | mul | mul | mul - - - - - - - - pre- 
brk 
0х5 | sad | - sad - - - - - - - - brk 
0x6 | mul+addıl+üddul+adqd mul+add ти!-аад mul+add mul--add mul-add mul+add mul+add - quadon 
0x7 | mul+addil+addul+add mul+add mul+tadd mul+add mul+add mult+add mul+add mul+add - quad} 
pop 
0x8 | in- - interp - - - - - - - - bar 
terp 
0х9 | rcp | - rcp - rsqrt lg2 sin сов ех2 - trap | trap 
Oxa | - - cvti2i | cvti2i | cvti2f | cvti2f | cvtf2i | сирі | сиру | сирр | - Joinat 
Охо | /ааа| fadd | fadd fadd - fset Јтах fmin ргевіп/рГеех2 brkpt, brkpt 
Oxc | fmul| fmul | fmul - fslct fslct quadop | - - - - bra 
c[] 
Oxd | - logic| logic add $a | 41) st l[] ldg[] | 8181) red g[] | atomic | - pre- 
op op gl] ret 
Охе | fmul+ffiddl+fdiddil+faddimul+fadddfma dadd dmul dmin dmax dset - - 
Oxf | tex- | - tex- texbias | texlod | tex texc- 27? emit/restartop/pmeyent - 
auto/fetch auto/fetch misc saa/gather 
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Instructions 


The instructions are roughly divided into the following groups: 
* Data movement instructions 
* Integer arithmetic instructions 
* Floating point instructions 
* Transcendential instructions 
* Double precision floating point instructions 
* Control instructions 
* Texture instructions 


* Misc instructions 


Data movement instructions 


Contents 


* Data movement instructions 
- Introduction 
— Data movement: (h)mov 
— Condition registers 
* Reading condition registers: mov (from $c) 
* Writing condition registers: mov (to $c) 
— Address registers 
* Reading address registers: mov (from $a) 
* Writing address registers: shl (to $a) 
* Increasing address registers: add ($a) 
— Reading special registers: mov (from $sr) 
— Memory space access 
* Const space access: ld c[] 
ж Local space access: ld ІП, st l[] 
* Shared space access: ld s[], st s[] 
ж Input space access: ld a[] 
* Output space access: st o[] 
— Global space access 
ж Global load/stores: ld g[], st g[] 


ж Global atomic operations: ld (addlincldeclmaxlminlandlorlxor) g[], xchg g[], cas g[] 


ж Global reduction operations: (addlincldeclmax|lminlandlorlxor) 811 
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Introduction 


Todo: write me 


Data movement: (h)mov 


Todo: write me 


[lanemask] mov b32/b16 DST SRC 

lanemask assumed Oxf for short and immediate versions. 
if (lanemask & 1 << (laneid & 3)) DST = SRC; 

Short: 0x10000000 base opcode 


0x00008000 0: 516, 1: 532 
operands: S*DST, S*SRC1/S*SHARED 


Imm: 0х10000000 base opcode 
0х00008000 0: b16, 1: b32 
operands: L*DST, IMM 


Long: 0х10000000 0x00000000 base opcode 
0х00000000 0х04000000 0: b16, 1: b32 
0x00000000 0x0003c000 lanemask 
operands: LL«DST, L«SRCl/L«SHARED 


Condition registers 


Reading condition registers: mov (from $c) 


Todo: write me 


mov DST COND 
DST is 32-bit $r. 
DST = COND; 


Long: 0x00000000 0x20000000 base opcode 
operands: LDST, COND 


Writing condition registers: mov (to $c) 


228 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Todo: write me 


mov CDST SRC 


SRC is 32-bit Sr. Yes, the 0x40 Sc write enable flag in second word is 


actually ignored. 
CDST = SRC; 


Long: 0х00000000 0xa0000000 base opcode 
operands: CDST, LSRC1 


Address registers 


Reading address registers: mov (from $a) 


Todo: write me 


mov DST AREG 


DST is 32-bit Sr. Setting flag normally used for autoincrement mode doesn't 
work, but still causes crash when using non-writable Sa's. 


DST = AREG; 


Long: 0х00000000 0х40000000 base opcode 
0x02000000 0x00000000 crashy flag 
operands: LDST, AREG 


Writing address registers: shl (to $a) 


Todo: write me 


shl ADST SRC SHCNT 
SRC is 32-bit $r. 
ADST = SRC << SHCNT; 


Long: 0х00000000 0хс0000000 base opcode 
operands: ADST, LSRC1/LSHARED, HSHCNT 


Increasing address registers: add ($a) 
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Todo: write me 


add ADST AREG OFFS 


Like mov from $a, setting flag normally used for autoincrement mode doesn't 
work, but still causes crash when using non-writable Sa's. 


ADST = AREG + OFFS; 


Long: 0xd0000000 0x20000000 base opcode 
0x02000000 0x00000000 crashy flag 
operands: ADST, AREG, OFFS 


Reading special registers: mov (from $sr) 


Todo: write me 


mov DST physid 
mov DST clock 
mov DST sreg2 
mov DST sreg3 
mov DST рш0 
mov DST pmi 
mov DST pm2 
mov DST pm3 


Ш 
чо (л ›Ь шо Мм ҥ © 


DST is 32-bit $r. 


DST - SREG; 


Long: 0x00000000 0x60000000 base opcode 
0x00000000 0x0001c000 S 
operands: LDST 


Memory space access 


Const space access: Id c[] 


Todo: write me 


Local space access: Id ІП, st I[] 


Todo: write me 
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Shared space access: Id s[], st s[] 


Todo: write me 


mov lock CDST DST sf] 
Tries to lock a word of s[] memory and load a word from it. CDST tells 
you if it was successfully locked-loaded, or no. А successfully locked 
word can't be locked by any other thread until it is unlocked. 


mov unlock s[] SRC 


Stores a word to previously-locked s[] word and unlocks it. 


Input space access: Id a[] 


Todo: write me 


Output space access: st o[] 


Todo: write me 


Global space access 


Global load/stores: Id g[], st о] 


Todo: write me 


Global atomic operations: Id (add|inc|dec|max|min|and|or|xor) g[], xchg g[], cas g[] 


Todo: write me 


Global reduction operations: (add|inc|dec|max|min|and|or|xor) ӘП 


Todo: write me 
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Integer arithmetic instructions 


Contents 


* [nteger arithmetic instructions 


— Introduction 


- Multiplication: mul(24) 


Comparison: set, hset 


Introduction 


- Sum of absolute differences: sad, hsad 


- Min/max selection: (h)min, (h)max 


— Addition/substraction: (h)add, (h)sub, (h)subr, (h)addc 


— Multiply-add: madd(24), msub(24), msubr(24), maddc(24) 


Bitwise operations: (h)and, (h)or, (h)xor, (h)mov2 


Bit shifts: (h)shl, (h)shr, (h)sar 


Todo: write me 


S(x): 31th bit of x for 32-bit x, 15th 


SEX(x): sign-extension of x 
ZEX(x): zero-extension of x 


for 16-bit x. 


Addition/substraction: (h)add, (h)sub, (h)subr, (h)addc 


Todo: write me 


add [sat] b32/b16 [CDS DS 
sub [sat] b32/b16 [CDS DS 


subr [sat] b32/b16 [CDS DS 


addc [sat] b32/b16 [CDS DS 


58С1 SRC2 
58С1 SRC2 
5КСІ SRC2 
SRC1 SRC2 


02-1, 01-0 
COND 02-1, О1=1 


All operands are 32-bit or 16-bit according to size specifier. 


b16/b32 sl, 52; 

bool с; 

switch (OP) { 
case add: sl = SRCl, 52 
case sub: sl = SRC1, 52 
case subr: sl = -5БС1, 
case addc: sl = 5КС1, s 


= SRC2, с 
= ~SRC2, 
52 = SRC2, 
2 = SRC2, 


= 0; break; 
c = 1; break; 

c = 1; break; 
c = COND.C; break; 
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(continued from previous page) 


res = в1%52%с; // infinite precision 
CDST.C = res >> (b32 ? 32 16); 
res = res & (b32 ? Oxffffffff Oxffff); 
CDST.O = (5(51) == (62)) && (S(s1) != S(res)); 
if (sat && CDST.O) 
if (S(res)) res = (b32 ? Ox7fffffff Ox7fff); 
else res = (b32 ? 0x80000000 0x8000); 
CDST.S - S(res); 
CDST.Z = res == 0; 
DST = res; 
Short/imm: 0x20000000 base opcode 
0x10000000 O2 bit 
0x00400000 O1 bit 
0x00008000 0: b16, 1: 532 
0x00000100 sat flag 
operands: S*DST, S*SRC1/S*SHARED, S*SRC2/S*CONST/IMM, $cO0 
Long: 0х20000000 0х00000000 base opcode 
0х10000000 0х00000000 02 bit 
0х00400000 0х00000000 O1 bit 
0х00000000 0х04000000 0: 516, 1: b32 
0х00000000 0х08000000 sat flag 
operands: MCDST, LL*DST, L*SRC1/L*SHARED, L*SRC3/L*CONST3, COND 
Multiplication: mul(24) 
Todo: write me 
mul [CDST] DST 016/516 SRC1 п16/516 SRC2 


DST із 32-16, 


58С1 and SRC2 are 16-bit. 


b32 81, 84: 
if (srcl signed) 
51 = SEX(SRC1); 
else 
51 = ZEX(SRC1); 
if (src2 signed) 
S2 = SEX(SRC2); 
else 
52 = ZEX(SRC2); 
b32 res = 51х52; // modulo 2^32 
CDST.O = 0; 
CDST.C = 0; 
CDST.S = S(res); 
CDST.Z = res == 0; 
DST = res; 
Short/imm: 0x40000000 base opcode 


0x00008000 srcl is signed 
0x00000100 src2 is signed 


operands: SDST, SHSRC/SHSHARED, SHSRC2/SHCONST/ IMM 


(continues on next page) 
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Long: 0x40000000 0x00000000 base opcode 

0х00000000 0x00008000 srcl is signed 

0x00000000 0x00004000 src2 is signed 

operands: MCDST, LLDST, LHSRC1/LHSHARED, LHSRC2/LHCONST2 
mul [CDST] DST [high] u24/s24 SRC1 SRC2 


All operands are 32-bit. 


р48 51, 52; 
if (signed) 4 
51 = SEX((b24)SRC1); 
52 = SEX((b24)SRC2); 
} else { 
51 = ZEX((b24)SRC1); 
52 = ZEX((b24)SRC2); 
} 
b48 m = slxs2; // modulo 2748 
b32 res = (high ? m >> 16 m & Oxffffffff); 
CDST.O = 0; 
CDST.C - 0; 
CDST.S = S(res); 
CDST.Z = res == 0; 
DST = res; 
Short/imm: 0x40000000 base opcode 


0x00008000 src are signed 
0x00000100 high 
operands: SDST, 


SSRC/SSHARED, 
Long: 0x40000000 0x00000000 base opcode 
0x00000000 0x00008000 src are signed 
0х00000000 0х00004000 high 


operands: MCDST, LLDST, LSRC1/LSHARED, І 


SSRC2/SCONST/IMM 


,SRC2/LCONST2 


Multiply-add: madd(24), msub(24), msubr(24), maddc(24) 


Todo: write me 


addop [CDS DST mul 116 58С1 SRC2 SRC3 01=0 02-000 52-0 51-0 
addop [CDS DST mul 516 SRC1 SRC2 SRC3 01-0 02-001 52-0 51-1 
addop sat [CDS DST mul 516 5КСІ SRC2 SRC3 01-0 02-010 52-1 51-0 
addop [CDS DST mul u24 58С1 SRC2 SRC3 01-0 02-011 52-1 51-1 
addop [CDS DST mul 524 SRC1 SRC2 SRC3 01-0 02-100 

addop sat [CDS DST mul s24 SRC1 SRC2 SRC3 01-0 02-101 

addop [CDS DST mul high 424 SRC1 SRC2 SRC3 01-0 02-110 

addop [CDS DST mul high s24 SRC1 SRC2 SRC3 01-0 02-111 

addop sat [CDS DST mul high 524 5ЕСІ SRC2 SRC3 01-1 02-000 


addop is one of: 


(continues on next page) 
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add 03-00 54-0 53-0 
sub 03-01 54-0 53-1 
subr 03-10 S4-1 53=0 
addc 03-11 54-1 S3-1 


If addop is addc, 
SRC3 are always 32-bit, 
32-bit for u24/s24 variants. 


short/immediate, and they're restricted to DST-SRC3. 
if (u24 || 824) 4 
548 51, 52; 
if (824) 4 
sl = SEX((b24) SRC1); 
52 = SEX((b24)SRC2); 
) else { 
51 = ZEX((b24)SRC1); 
82 = ZEX((b24)SRC2); 
} 
b48 m = si*«s2; // modulo 2748 
532 mres = (high ? m >> 16 m & Oxffffffff); 
) else { 
b32- S1,-—8257 
if (s16) ( 
51 = SEX(SRC1); 
52 = SEX(SRC2); 
) else { 
51 = ZEX(SRC1); 
52 = ZEX(SRC2); 
} 
b32 mres = 51х52; // modulo 2^32 
} 
b32Z Sl;-$25 
bool c; 
switch (OP) { 
case add: 51 = mres, 52 = SRC3, c = 0; break; 
case sub: sl = mres, 52 = ~SRC3, с = 1; break; 
case subr: sl = ~mres, 52 = SRC3, с = 1; break; 
case addc: 51 = mres, 52 = SRC3, c = COND.C; break; 
} 
res = sl+s2+c; // infinite precision 
CDST.C = res >> 32; 
res = res & Oxffffffff; 
CDST.O = (S(s1) == (s2)) && (5(81) != S(res)); 
if (sat && CDST.O) 
if (S(res)) res = OxT7fffffff; 
else res = 0x80000000; 
CDST.S - S(res); 
CDST.Z = res == 0; 
DST = res; 
Short/imm: 0x60000000 base opcode 
0x00000100 51 
0x00008000 52 
0x00400000 $3 
0x10000000 S4 
operands: SDST, S«SRC/S«SHARED, S*SRC2/S*CONST/IMM, 


insn also takes an additional COND parameter. 
58С1 and SRC2 are 16-bit for ul6/s16 variants, 
Only a few of the variants are encodable as 


SDST, 


DST and 


5с0 


(continues on next page) 


2.9. PGRAPH: 2d/3d graphics and compute engine 


235 


nVidia Hardware Documentation, Release git 


(continued from previous page) 


Long: 0x60000000 0x00000000 base opcode 
0x10000000 0x00000000 O1 
0x00000000 0xe0000000 02 
0x00000000 0x0c000000 03 
operands: MCDST, LLDST, L*SRC1/L*SHARED, L«SRC2/L«CONST2, L*SRC3/L*CONST3, COND 


Sum of absolute differences: sad, hsad 


Todo: write me 


sad [CDST] DST u16/s16/u32/s32 SRC1 SRC2 SRC3 


Short variant is restricted to DST same as SRC3. All operands are 32-bit or 
16-bit according to size specifier. 


int sl, s2; // infinite precision 
if (signed) { 


sl = SEX(SRC1); 

52 = SEX(SRC2); 
} else { 

51 = ZEX(SRC1); 

s2 = ZEX(SRC2); 


} 

b32 mres = арѕ (51-52); // modulo 2232 
res = mrest+s3; // infinite precision 
CDST.C = res >> (032 ? 32 : 16); 

res = res & (b32 ? Oxffffffff : Oxffff); 


DCDST,.O (S(mres) -- (s3)) && (S(mres) !- S(res)); 
CDST.S = S(res); 
CDST.Z = res == 0; 
DST = res; 
Short: 0x50000000 base opcode 


0х00008000 0: b16 1: 532 
0x00000100 src are signed 
operands: DST, SDST, S*SRC/S*SHARED, S«SRC2/S«CONST, SDST 


Long: 0х50000000 0х00000000 base opcode 
0х00000000 0х04000000 0: b16, 1: b32 
0х00000000 0х08000000 src sre signed 
operands: MCDST, LLDST, L*SRC1/L*SHARED, L*SRC2/L*CONST2, L*SRC3/L*CONST3 


Min/max selection: (h)min, (h)max 


Todo: write me 


min ul16/u32/s16/s32 [CDS 
max ul6/u32/s16/s32 [CDS 


DST SRC1 SRC2 
DST SRC1 SRC2 
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All operands are 32-bit or 16-bit according to size specifier. 


if (58С1 < SRC2) ( // signed comparison for s16/s32, unsigned for ul6/u32. 


res = (min ? SRC1 : SRC2); 
j else ( 

res = (min ? SRC2 : SRC1); 
} 
CDST.O = 0; 
CDST.C = 0; 
CDST.S = S(res); 
CDST.Z = res == 0; 


Long: 0x30000000 0х80000000 base opcode 
0x00000000 0x20000000 0: max, 1: min 
0х00000000 0х08000000 0: ul6/u32, 1: 516/532 
0х00000000 0х04000000 0: b16, 1: b32 
operands: MCDST, LL*DST, L*SRC1/L*SHARED, LxSRC2/LxCONST2 


Comparison: set, hset 


Todo: write me 


set [CDST] DST cond u16/s16/u32/s32 58С1 SRC2 
cond can be any subset of (1, 4, е). 
All operands are 32-bit or 16-bit according to size specifier. 


int sl, s2; // infinite precision 
if (signed) { 


sl = SEX(SRC1); 
52 = SEX(SRC2); 
} else ( 
sl = ZEX(SRC1); 
s2 = ZEX(SRC2); 
} 
bool c; 
if (sl < s2) 

с = сопа.1; 
else if (sl == 52) 
с = cond.e; 
else /ж sl > s2 х/ 
c = cond.g; 

if (c) { 
res = (b32?0xffffffff:Oxffff); 
} else { 
res = 0; 
} 
CDST.O = 0; 
CDST.C = 0; 
CDST.S = S(res); 


(continues on next page) 
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CDST.Z 
DST 


res; 

Long: 
0x00000000 
0x00000000 
0x00000000 
0x00000000 
0x00000000 
operands: 


res == 0; 


0x30000000 0x60000000 base opcode 


0x08000000 0: u16/u32, 1: s16/s32 
0х04000000 0: b16, 1: b32 
0x00010000 cond.g 
0x00008000 cond.e 
0x00004000 cond.1 

MCDST, LL*DST, L*xSRC1/L*SHARED, 


L*SRC2/L*CONST2 


Bitwise operations: (h)and, (h)or, (h)xor, (h)mov2 


Todo: write me 


and b32/b16 [CDS 
or 532/516 [CDST 
xor b32/b16 [CDS 
mov2 b32/b16 [CD 


Immediate forms only allows 32-bit operands, 


] DS [not] 58С1 [not] SRC2 02-0, O1=0 
DST [not] 58С1 [not] SRC2 02-0, 01-1 
] DST [not] 58С1 [not] SRC2 02-1, 01-0 
ST] DST [not] 58С1 [not] SRC2 02-1, 01-1 


and cannot 


negate second op. 


sl = (notl ? ~SRC1 ORCI»; 

S2 = (not2 ? ~SRC2 SRBCZ); 

switch (OP) { 
case and: res = 51 & 52; break; 
case or: res = 51 | s2; break; 
case xor: res = 51 ^ 52; break; 
case mov2: res - s2; break; 

} 

CDST.O = 0; 

CDST.C = 0; 

CDST.S = S(res); 

CDST.Z = res == 0; 

DST = res; 

Imm: 0х40000000 base opcode 
0х00400000 not1 
0х00008000 02 bit 
0х00000100 O1 bit 
operands: SDST, SSRC/SSHARED, IMM 
assumed: not2=0 and 532. 

Long: 0xd0000000 0x00000000 base opcode 
0x00000000 0х04000000 0: 516, 1: 032 
0x00000000 0x00020000 not2 
0x00000000 0x00010000 notl 
0x00000000 0x00008000 O2 bit 
0x00000000 0x00004000 O1 bit 
operands: MCDST, LL*DST, L*xSRC1/L*SHARED, L«SRC2/LxCONST2 
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Bit shifts: (h)shl, (h)shr, (h)sar 


Todo: write me 


shl р16/р32 [CDST] DST SRC1 SRC2 

shl р16/р32 [CDST] DST SRC1 SHCNT 

shr ul6/u32 [CDST] DST 58С1 SRC2 

shr ul6/u32 [CDST] DST 58С1 SHCNT 

shr s16/s32 [CDST] DST 58С1 SRC2 

shr s16/s32 [CDST] DST 58С1 SHCNT 
All operands 16/32-bit according to size specifier, except SHCNT. Shift 
counts are always treated as unsigned, passing negative value to shl 


doesn't get you a shr. 


int size = (32 ? 32 : 16); 
if (shl) { 
res = SRC1 << SRC2; // infinite precision, shift count doesn't wrap. 


if (SRC2 < size) ( // yes, <. So if you shift 1 left by 32 bits, you DON'T get, 
—CDST.C set. but shift 2 left by 31 bits, and it gets set just fine. 
CDST.C = (res >> size) & 1; // basically, the bit that got shifted out. 
} else 4 
CDST.C = 0; 
} 
res = res & (032 ? Oxffffffff : Oxffff); 
} else { 
res = 58С1 >> SRC2; // infinite precision, shift count doesn't wrap. 
if (signed && 5(58С1)) { 
if (SRC2 < size) 
res |= (1««812е)-(1««(8126-58С2)), // fill out the upper bits with 1's. 
else 
res |= (1««size)-1; 
} 
if (SRC2 < size && SRC2 > 0) 4 
CDST.C = (SRC1 >> (SRC2-1)) & 1; 
) else { 
CDST.C = 0; 


if (SRC2 == 1) { 

CDST.O = (S(SRC1) != S(res)); 
} else { 

CDST.O = 0; 


CDST.S = S(res); 
CDST.Z res == 0; 
DST -» res; 
Long: 0x30000000 0xc0000000 base opcode 


0х00000000 0х20000000 0: shl, 1: shr 

0х00000000 0х08000000 0: u16/u32, 1: 516/532 [shr only] 
0х00000000 0х04000000 0: b16, 1: 032 

0х00000000 0х00010000 0: use SRC2, 1: use SHCNT 

operands: MCDST, LL*DST, L*SRC1/L*SHARED, L*SRC2/L*CONST2/SHCNT 


2.9. PGRAPH: 2d/3d graphics and compute engine 239 


nVidia Hardware Documentation, Release git 


Floating point instructions 


Contents 


* Floating point instructions 
- Introduction 
— Addition: fadd 
— Multiplication: (ти 
- Multiply+add: fmad 
- Min/max: fmin, (тах 


- Comparison: fset 


— Selection: fslct 


Introduction 


Todo: write me 


Addition: fadd 


Todo: write me 


add [sat] rn/rz £32 DST SRC1 SRC2 


Adds two floating point numbers together. 


Multiplication: fmul 


Todo: write me 


mul [sat] rn/rz £32 DST SRC1 SRC2 


Multiplies two floating point numbers together 


Multiply--add: fmad 


Todo: write me 
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add f32 DST mul SRC1 SRC2 SRC3 


A multiply-add instruction. With intermediate rounding. Nothing 
interesting. DST = SRC1 х SRC2 + SRC3; 


Min/max: fmin, fmax 


Todo: write me 


min £32 DST SRC1 SRC2 
max £32 DST SRC1 SRC2 


Sets DST to the smaller/larger of two 58С1 operands. If one operand is NaN, 
DST is set to the non-NaN operand. If both are NaN, DST is set to NaN. 


Comparison: fset 


Todo: write me 


set [CDST] DST <cmpop> #32 5БС1 SRC2 


Does given comparison operation on 5КСІ and SRC2. DST is set to Oxffffffff 
if comparison evaluats true, 0 if it evaluates false. if used, CDST.SZ are 
set according to DST. 


Selection: fslct 


Todo: write me 


slct b32 DST SRC1 SRC2 £32 SRC3 


Sets DST to 5ЕСІ if SRC3 is positive or 0, to SRC2 if SRC3 negative or NaN. 


Transcendential instructions 


Contents 


* Transcendential instructions 


— Introduction 


— Preparation: pre 
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- Reciprocal: rcp 
— Reciprocal square root: rsqrt 
— Base-2 logarithm: 192 


— Sinus/cosinus: sin, cos 


— Base-2 exponential: ex2 


Introduction 


Todo: write me 


Preparation: pre 


Todo: write me 


presin f32 DST SRC 
preex2 f32 DST SRC 


Preprocesses a float argument for use in subsequent sin/cos or ex2 
operation, respectively. 


Reciprocal: rcp 


Todo: write me 


rcp ЕЗ2 DST SRC 


Computes 1/x. 


Reciprocal square root: rsqrt 


Todo: write me 


rsqrt f32 DST SRC 


Computes 1/sqrt (x). 
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Base-2 logarithm: 102 


Todo: write me 


142 £32 DST SRC 


Computes log_2(x). 


Sinus/cosinus: sin, cos 


Todo: write me 


sin £32 DST SRC 
cos £32 DST SRC 


Computes sin(x) or cos(x), needs argument preprocessed by pre.sin. 


Base-2 exponential: ex2 


Todo: write me 


ex2 £32 DST SRC 


Computes 2**x, needs argument preprocessed by pre.ex2. 


Double precision floating point instructions 


Contents 


* Double precision floating point instructions 
- Introduction 
- Addition: dadd 
- Multiplication: dmul 
- Fused multiply--add: ата 


- Min/max: апип, атах 


- Comparison: dset 


Introduction 
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Todo: write me 


Addition: dadd 


Todo: write me 


Multiplication: dmul 


Todo: write me 


Fused multiply--add: dfma 


Todo: write me 


fma f64 DST 5КСІ SRC2 SRC3 


Fused multiply-add, with no intermediate rounding. 


Min/max: dmin, dmax 


Todo: write me 


min £64 DST SRC1 SRC2 
max f64 DST SRC1 SRC2 


Sets DST to the smaller/larger of two 58С1 operands. If one operand is NaN, 
DST is set to the non-NaN operand. If both are NaN, DST is set to NaN. 


Comparison: dset 


Todo: write me 


set [CDST] DST <cmpop> #64 5БС1 SRC2 


Does given comparison operation on 5КСІ and SRC2. DST is set to Oxffffffff 
if comparison evaluats true, 0 if it evaluates false. if used, CDST.SZ are 
set according to DST. 
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Control instructions 


Contents 


* Control instructions 
- Introduction 
- Halting program execution: exit 
- Branching: bra 
— Indirect branching: bra c[] 


— Setting up a rejoin point: joinat 


Rejoining execution paths: join 


Preparing a loop: prebrk 


Breaking out of a loop: brk 


Calling subroutines: call 


Returning from a subroutine: ret 


Pushing a return address: preret 

— Aborting execution: trap 

- Debugger breakpoint: brkpt 

— Enabling whole-quad mode: quadon, quadpop 


— Discarding fragments: discard 


— Block thread barriers: bar 


Introduction 


Todo: write me 


Halting program execution: exit 


Todo: write me 


exit 


Actually, not a separate instruction, just a modifier available on all 
long insns. Finishes thread's execution after the current insn ends. 


Branching: bra 
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Todo: write me 


bra «code target» 


Branches to the given place in the code. If only some subset of threads 
in the current warp executes it, one of the paths is chosen as the active 
one, and the other is suspended until the active path exits or rejoins. 


Indirect branching: bra c[] 


Todo: write me 


Setting up a rejoin point: joinat 


Todo: write me 


joinat «code target» 


The arugment is address of a future join instruction and gets pushed 
onto the stack, together with a mask of currently active threads, for 
future rejoining. 


Rejoining execution paths: join 


Todo: write me 


join 


Also a modifier. Switches to other diverged execution paths on the same 
stack level, until they've all reached the join point, then pops off the 
entry and continues execution with a rejoined path. 


Preparing a loop: prebrk 


Todo: write me 


breakaddr «code target» 


Like call, except doesn't branch anywhere, uses given operand as the 
return address, and pushes a different type of entry onto the stack. 
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Breaking out of a loop: brk 


Todo: write me 


break 


Like ret, except accepts breakaddr's stack entry type, not call's. 


Calling subroutines: call 


Todo: write me 


call «code target» 


Pushes address of the next insn onto the stack and branches to given place. 
Cannot be predicated. 


Returning from a subroutine: ret 


Todo: write me 


ret 


Returns from a called function. If there's some not-yet-returned divergent 
path on the current stack level, switches to it. Otherwise pops off the 
entry from stack, rejoins all the paths to the pre-call state, and 
continues execution from the return address on stack. Accepts predicates. 


Pushing a return address: preret 


Todo: write me 


Aborting execution: trap 


Todo: write me 


trap 


Causes an error, killing the program instantly. 
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Debugger breakpoint: brkpt 


Todo: write me 


brkpt 


Doesn't seem to do anything, 
somewhere in PGRAPH, somehow. 


probably generates a breakpoint when enabled 


Enabling whole-quad mode: quadon, quadpop 


Todo: write me 


quadon 


Temporarily enables all threads in the current quad, 

or not getting started at all]. 
and so is using any non-quadpop 

For diverged threads, 


disabled before [by diverging, exitting, 
Nesting this is probably a bad idea, 
control insns while this is active. 
is unaffected by this temporal enabling. 


quadpop 


Undoes a previous quadon command. 


even if they were 


the saved PC 


Discarding fragments: discard 


Todo: write me 


Block thread barriers: bar 


Todo: write me 


bar sync «barrier number» 


Waits until all threads in the block arriv 
execution... probably... somehow... 


at the barrier, then continues 


Texture instructions 
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Contents 


* Texture instructions 
— Introduction 


— Automatic texture load: texauto 


Raw texel fetch: texfetch 

- Texture load with ГОР bias: texbias 

- Texture load with manual ГОР: texlod 
— Texture size query: texsize 

— Texture cube calculations: texprep 

- Texture LOD query: texquerylod 


— Texture CSAA load: texcsaa 


— Texture quad load: texgather 


Introduction 


Todo: write me 


Automatic texture load: texauto 


Todo: write me 


texauto [deriv] live/all <texargs> 


Does a texture fetch. Inputs are: x, y, 2, array index, dref [skip all 
that your current sampler setup doesn't use]. x, y, z, dref are floats, 
array index is integer. If running in FP or the deriv flag is on, 
derivatives are computed based on coordinates in all threads of current 
quad. Otherwise, derivatives are assumed 0. For FP, if the live flag 

is on, the tex instruction is only run for fragments that are going to 
be actually written to the render target, ie. for ones that are inside 
the rendered primitive and haven't been discarded yet. all executes 

the tex even for non-visible fragments, which is needed if they're going 
to be used for further derivatives, explicit or implicit. 


Raw texel fetch: texfetch 


Todo: write me 
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texfetch live/all <texargs> 


А single-texel fetch. The inputs are x, y, 2, index, lod, and are all 
integer. 


Texture load with LOD bias: texbias 


Todo: write me 


texbias [deriv] live/all <texargs> 


Same as texauto, except takes an additional [last] float input specifying 
the LOD bias to add. Note that bias needs to be the same for all threads 
in the current quad executing the texbias insn. 


Texture load with manual LOD: texlod 


Todo: write me 


Does a texture fetch with given coordinates and LOD. Inputs are like 
texbias, except you have explicit LOD instead of the bias. Just like 
in texbias, the LOD should be the same for all threads involved. 


Texture size query: texsize 


Todo: write me 


texsize live/all <texargs> 


Gives you (width, height, depth, mipmap level count) in output, takes 
integer LOD parameter as its only input. 


Texture cube calculations: texprep 


Todo: write me 


Texture LOD query: texquerylod 


Todo: write me 
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Texture CSAA load: texcsaa 


Todo: write me 


Texture quad load: texgather 


Todo: write me 


Misc instructions 


Contents 


* Misc instructions 
- Introduction 
— Data conversion: cvt 
— Attribute interpolation: interp 
- Intra-quad data movement: quadop 
- Intra-warp voting: vote 


— Vertex stream output control: emit, restart 


— Nop / PM event triggering: nop, pmevent 


Introduction 


Todo: write me 


Data conversion: cvt 


Todo: write me 


cvt «integer dst» «integer src» 

cvt «integer rounding modifier» «integer dst» «float src» 
cvt «rounding modifier» «float dst» «integer src» 

cvt «rounding modifier» «float dst» «float src» 

cvt «integer rounding modifier» «float dst» «float src» 


Converts between formats. For integer destinations, always clamps result 
со target type range. 
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Attribute interpolation: interp 


Todo: write me 


interp [cent] [flat] DST v[] [SRC] 


Gets interpolated FP input, 


optionally mu 


ltiplying by a given value 


Intra-quad data movement: quadop 


Todo: write me 


quadop £32 «ор1» «ор2» «ор3» «op4» DST «src 


Intra-quad information exchange instructi 
First, 55С1 is taken from the given lane 

op<currentlanenumber> is executed on it a 
written to DST. ops сап be add [SRC1+SRC2 
subr [SRC2-SRC1], mov2 [SRC2]. srclane ca 


lane» SRC1 SRC2 


on. Mad as a hatter. 
in current quad. Then 
nd SRC2, results get 
|, sub [SRCI-SRC2], 

n be at least 10, 11, 


12, 13, and these work everywher If you 


're running in FP, looks 


^ 


like you can also use dox [use current 1а 
[use current lane number 2], but using 
in always getting 0 as the result... 


^ 


ne number 1] and doy 
thes lsewhere results 


Intra-warp voting: vote 


Todo: write me 


PR 


EDICAT 


vote any/all CDST 


This instruction doesn't use the predicat 
abusing it instead as an input argument. 

input predicat valuated to true in any 

vote all sets it to true iff the predicat 
threads of the current warp. 


field for conditional execution, 
vote any sets CDST to true iff the 
of the warp's active threads. 
e evaluated to true in all acive 


Vertex stream output control: emit, restart 


Todo: write me 


emit 


GP-only instruction that emits current contents of $o registers as the 


(continues on next page) 
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(continued from previous page) 


next vertex in the output primitive and clears $o for some reason. 
restart 


GP-only instruction that finishes current output primitive and starts 
a new one. 


Мор / PM event triggering: nop, pmevent 


Todo: write me 


Per-MP performance counters 


Contents 


* Per-MP performance counters 


— Introduction 


Introduction 


Todo: write me 


Vertex fetch: VFETCH 


Contents 


* Vertex fetch: VFETCH 
* PCOUNTER signals 


Todo: write me 


PCOUNTER signals 


Mux 0: 
e Ох0е: geom vertex in count[0] 
e OxOf: geom, vertex in count[1] 


e 0x10: geom, vertex in count[2] 
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* 0x19: 


Mux 1: 


* 0x02: 
* 0x03: 
* 0x08: 
* OxOb: 
• Ox0e: 
e 0x14: 
e 0x15: 
e 0x17: 
e 0х18: 


CG_IFACE_DISABLE [G80] 


input_assembler_busy[0] 
input_assembler_busy[1] 

geom, primitive in count 

input assembler waits for fb [G200:] 
input assembler waits for fb [G80:G200] 
G200:] 

G200:] 
G80:G200] 
G80:G200] 


input assembler busy[2 


[ 

input assembler busy[3 
[ 
[ 


input assembler busy[2 


— clc c 


[ 
[ 
[ 
[ 


input assembler busy[3 


Mux 2 [G84:]: 


* 0x00: 
* 0x01: 
* 0x02: 


Pre-ROP: 


CG[0] 
CG[1] 
CG[2] 


PROP 


Contents 


* Pre- 


ROP: PROP 
* PCOUNTER signals 


Todo: write me 


PCOUNTER signals 


* 0x00: 


* 0x03: 


2: тор busy[O 
3: rop busy[1 
4: тор busy[2 
5: тор busy[3] 
6: rop waits Тог shader[0] 
7 


: rop waits for shader[1] 


shaded pixel count. ..? 


* 0х15: 


0-5: rop samples in count 1 
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— 6: rop samples in count O[0] 


- 7: rop samples in count 011| 
e 0x16: 


- 0-5: rasterizer_pixels_out_count_1 


- 6: rasterizer pixels out count O[0] 


— 7: rasterizer pixels out count O[1] 


* Oxla: 


— 0-5: rop samples killed by earlyz count 
e Oxlb: 


— 0-5: rop samples killed by latez count 


e Ox1c: shaded pixel, count...? 
e Ox1d: shaded pixel count...? 
* Oxle: 
- 0: СС IFACE DISABLE [G80] 
- 0: СО[0] [G84:] 
- 1: CG[1] [G84:] 
- 2: CG[2] [G84:] 


Color raster output: CROP 


Contents 


* Color raster output: CROP 


* PCOUNTER signals 


Todo: write me 


PCOUNTER signals 


e (хі: 
- 0: СС IFACE DISABLE [G80] 
- 2: гор waits for fb[0] 
- 3: rop. waits for fb[1] 
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Zeta raster output: ZROP 


Contents 


* Zeta raster output: ZROP 


* PCOUNTER signals 


Todo: write me 


PCOUNTER signals 


e (хі: 
- 2: гор waits for fb[0] 
- 3: rop waits for fb[1] 
• 0х4: 
- 1: СО IFACE DISABLE [G80] 


2.9.12 Fermi graphics апа compute engine 


Contents: 


Fermi macro processor 


Contents 


* Fermi macro processor 


— Introduction 


— Registers 


Introduction 


Todo: write me 


Registers 


Todo: write me 


256 


Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


404400+ix4, 1«8: REG[] 
404420: OPCODE [at РС] 
404424: РС 
404428: МЕХТРС 
40442c: STATE 
Ю0: ? 
b4: ? 
08: ACTIVE 
р9: PARM/MADDR? 
404430: 222 117ffff 
404434: ??? 17ҒҒҒҒ 
404438: ??? 13ffff 
404460: 222 7f 
404464: 222 ТЕЕ 
404468: WATCHDOG TIMEOUT [30-bit] 
40446c: WATCHDOG TIME [30-bit] 
404480: ??? 3 
404488: MCACHE CTRL 
40448c: МСАСНЕ DATA 
404490: TRAP 
b0: OO FEW PARAMS 
р1: TOO MANY PARAMS 
b2: ILLEGAL OPCODE 
53: DOUBLE BRANCH 
54: IMEOUT 
529: ? 
530: CLEAR 
b31: ENABLE 
404494: TRAP_PC and something? 
404498: 1/11/0 
40449c: TRAP_OPCODE? 
4044a0: STATUS [0000000f - idle] 


Fermi context switching units 


Contents: 


Fermi context switching units 


Todo: convert 


Present on: 
сс0: GF100:GK104 

сс1: GK104:GK208 

сс2: GK208:GM107 

сс3: ОМ107: 

ВАКО address: 

НОВ: 0х409000 

СРС: 0x502000 + idx * 0x8000 
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PMC interrupt line: 222 PMC enable bit: 12 [all of PGRAPH] Version: 
сс0, cc1: 3 
cc2, cc3: 5 


Code segment size: HUB cc0: 0x4000 HUB ccl, сс2: 0x5000 HUB cc3: 0x6000 GPC cc0: 0x2000 GPC ccl, cc2: 
0x2800 GPC cc3: 0x3800 


Data segment size: HUB: 0x1000 GPC cc0-cc2: 0x800 GPC cc3: 0xc00 
Fifo size: HUB сс0-сс1: 0x10 HUB сс2-сс3: 0x8 GPC cc0-cc1: 0х8 GPC сс2-сс3: 0х4 
Xfer slots: 8 
Secretful: no 
Code TLB index bits: 8 
Code ports: 1 
Data ports: 
сс0, сс1: 1 
сс2, сс3: 4 
IO addressing type: indexed 
Core clock: 
HUB: hub clock [GF100 clock #9] 
GPC: GPC clock [GF100 clock 401 [XXX: divider] 
The IO register ranges: 


400/10000:500/14000 CC misc CTXCTL support [graph/gf100-ctxctl/intro.txt] 500/14000:600/18000 FIFO 
command FIFO submission [graph/gf100-ctxctl/intro.txt] 600/18000:700/1с000 MC PGRAPH master control 
[graph/gf100-ctxctl/intro.txt] 700/1с000:800/20000 MMIO MMIO bus access [graph/gf100-ctxctl/mmio.txt] 
800/20000:900/24000 MISC misc/unknown stuff [graph/gf100-ctxctl/intro.txt] 900/24000:400/28000 STRAND 
context strand control [graph/gf100-ctxctl/strand.txt] а00/28000:500/26000 MEMIF memory interface 
[graph/gf100-ctxctl/memif.txt] b00/2c000:c00/30000 CSREQ PFIFO switch requests [graph/gf100-ctxctl/intro.txt] 


related to MEMIF? [XXX] [GK104-] 


Registers іп CC range: 400/10000 INTR interrupt signals 404/101xx INTR. ROUTE falcon interrupt routing 
40c/1030x BAR. КЕОМА5К(0| barrier required bits 410/1040x BAR, КЕОМА5К(1| barrier required bits 414/1050x 
BAR barrier state 418/10600 BAR, SET[0] set barrier bits, barrier 0 41c/10700 ВАК SETL[1] set barrier bits, barrier 1 
420/10800 IDLE. STATUS CTXCTL subunit idle status 424/10900 USER, BUSY user busy flag 430/10с00 WATCH- 
DOG watchdog timer 484/12100H ??? [XXX] 


Registers in FIFO range: 500/14000 DATA FIFO command argument 504/14100 CMD FIFO command submission 


Registers in MC range: 604/18100H HUB UNITS PART/GPC count 608/18200G GPC UNITS TPC/ZCULL count 
60c/18300H ??? [XXX] 610/18400H ??? [XXX] 614/18500 RED SWITCH enable/power/pause master control 
618/18600G GPCID the id of containing GPC 620/18800 ОС CAPS falcon code and data size 698/1a600G 222 
[XXX] 69c/1a700G ??? [XXX] 


Registers in MISC range: 800/20000:820/20800 SCRATCH scratch registers 820/20000:820/21000 SCRATCH. SET 
set bits in scratch registers 840/20000:820/21800 SCRATCH. CLEAR clear bits in scratch registers 86c/21b00 ??? 
related to strands? [XXX] 870/21c00 ??? [XXX] 874/21400 ??? [XXX] 878/21е00 ??? [XXX] 880/22000 STRANDS 
strand count 884/22100 ??? [XXX] 890/22400 ??? JOE? [XXX] 894/22500 ??? JOE? [XXX] 898/22600 ??? JOE? 
[XXX] 89с/22700 222 JOE? [XXX] 8230/22800 222 [XXX] 8а4/22900 222 [XXX] 8а8/22а00 ??? [XXX] 8b0/22c00 
222 [XXX] [GK104-] 8b4/22d00 222 [XXX] [GK104-] 
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Registers in CSREQ range: 500/2с000Н CHAN. CUR current channel b04/2c100H СНАМ NEXT next channel 
b08/2c200H INTR, EN interrupt enable? b0c/2c300H INTR interrupt b80/2e000H 222 [XXX] b84/2e100H ??? 
[XXX] 


Registers in GRAPH range: c00/30000H СМО. STATUS some PGRAPH status bits? c08/30200H CMD TRIGGER 
triggers misc commands to PGRAPH? c14/305xxH INTR UP ROUTE upstream interrupt routing c18/30600H 
INTR UP STATUS upstream interrupt status c1c/30700H INTR. UP. SET upstream interrupt trigger c20/30800H 
INTR. UP CLEAR upstream interrupt clear c24/30900H INTR, UP ENABLE upstream interrupt enable [XX X: more 
bits on GK104] c80/32000G VSTATUS 0 subunit verbose status c84/32100G VSTATUS 1 subunit verbose status 
c88/32200G VSTATUS 2 subunit verbose status c8c/32300G VSTATUS 3 subunit verbose status c90/32400G TRAP 
GPC trap status c94/32500G TRAP EN ОРС trap enable 


Interrupts: 0-7: standard falcon intterrupts 8-15: controlled by INTR. ROUTE 
[XXX: IO regs] [XXX: interrupts] [XXX: status bits] 
[XXX: describe CTXCTL ] 


Signals 


0х00-0х 18: engine dependent [XXX] 0x20: ZERO - always 0 0x21: ??? - bit 9 of reg 0x128 of corresponding IBUS 
piece [XXX] 0x22: STRAND - strand busy executing command [graph/gf100-ctxctl/strand.txt] 0x23: 222, affected by 
RED SWITCH [XXX] 0x24: IB UNKAO, last state of IB UNKAO bit, from DISPATCH.SUBCH reg 0x25: MMCTX 
- MMIO transfer complete [graph/gf100-ctxctl/mmio.txt] 0x26: MMIO RD - MMIO read complete [graph/gf100- 
ctxctl/mmio.txt] 0x27: MMIO WRS - MMIO synchronous write complete [graph/gf100-ctxctl/mmio.txt] 0x28: 
BAR 0 - barrier #0 reached [see below] 0x29: BAR 1 - barrier 41 reached [see below] Ох2а: 222 - related to 
PCOUNTER [XXX] 0x2b: WATCHDOG - watchdog timer expired [see below] Ох2с: ??? - related to MEMIF [XXX] 
0х24: ??? - related to MEMIF [XXX] 0х26: ??? - related to MEMIF [XXX] 


Fermi CUDA processors 


Contents: 


Fermi CUDA ISA 


Contents 


* Fermi CUDA ISA 
- Introduction 
* Variants 
* Warps and thread types 
* Registers 
* Memory 
* Other execution state and resources 
— Instruction format 


— Instructions 


— Notes about scheduling data and dual-issue on GK104 
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* DUAL ISSUE | 


Introduction 


This file deals with description of Fermi CUDA instruction set. CUDA stands for Completely Unified Device Archi- 
tecture and refers to the fact that all types of shaders (vertex, tesselation, geometry, fragment, and compute) use nearly 
the same ISA and execute on the same processors (called streaming multiprocessors). 


The Fermi CUDA ISA is used on Fermi (СЕ1хх) and older Kepler (GK10x) GPUs. Older (Tesla) CUDA GPUs use 
the Tesla ISA. Newer Kepler ISAs use the Kepler2 ISA. 


Variants 


There are two variants of the Fermi ISA: the GF100 variant (used on Fermi GPUs) and the GK104 variant (used on 
first-gen Kepler GPUs). The differences are: 


* GF100: 
— surface access based on 8 bindable slots 
* ОК104: 
— surface access based on descriptor structures stored in c[]? 
— some new instructions 
— texbar instruction 


— every 8th instruction slot should be filled by a special sched instruction that describes dependencies and 
execution plan for the next 7 instructions 


Todo: rather incomplete. 


Warps and thread types 


Like on Tesla, programs are executed in warps. 
There are 6 program types on Fermi: 

* vertex programs 

* tesselation control programs 

* tesselation evaluation programs 

* geometry programs 

* fragment programs 


* compute programs 


Todo: and vertex programs 2? 
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Todo: figure out the exact differences between these & the pipeline configuration business 


Registers 


The registers in Fermi ISA are: 


* up to 63 32-bit GPRs per thread: $r0-$r62. These registers are used for all calculations, whether integer or 
floating-point. In addition, $163 is a special register that's always forced to 0. 


The amount of available GPRs per thread is chosen by the user as part of MP configuration, and can be selected 
per program type. For example, if the user enables 16 registers, $r0-$r15 will be usable and $r16-$r62 will 
be forced to 0. Since the MP has a rather limitted amount of storage for GPRs, this configuration parameter 
determines how many active warps will fit simultanously on an MP. 


If a 64-bit operation is to be performed, any naturally aligned pair of GPRs can be treated as a 64-bit register: 
$rXd (which has the low half in $rX and the high half in $r(X--1), and X has to even). Likewise, if a 128-bit 
operation is to be performed, any naturally aligned group of 4 registers can be treated as a 128-bit registers: 
$rXq. The 32-bit chunks are assigned to $rX..(X+3) in order from lowest to highest. 


Unlike Tesla, there is no way to access a 16-bit half of a register. 


* 7 |-bit predicate registers per thread: $p0-$p6. There's also $p7, which is always forced to 1. Used for condi- 
tional execution of instructions. 


* 1 4-bit condition code register: $c. Has 4 bits: 


— bit 0: Z - zero flag. For integer operations, set when the result is equal to 0. For floating-point operations, 
set when the result is 0 or NaN. 


— bit 1: S- sign flag. For integer operations, set when the high bit of the result is equal to 1. For floating-point 
operations, set when the result is negative or NaN. 


— bit 2: C - carry flag. For integer addition, set when there is a carry out of the highest bit of the result. 


— bit 3: O - overflow flag. For integer addition, set when the true (infinite-precision) result doesn't fit in the 
destination (considered to be a signed number). 


Overall, works like one of the Tesla $c0-$c3 registers. 


* $flags, a flags register, which is just an alias to $c and $pX registers, allowing them to be saved/restored with 
one mov: 


— bits 0-6: $p0-$p6 
— bits 12-15: $c 
* A few dozen read-only 32-bit special registers, $sr0-$sr127: 
- $sr0 aka $laneid: XXX 
- $sr2 aka $nphysid: XXX 
- $sr3 aka $physid: ХХХ 
- $sr4-$sr11 aka $pm0-$pm7: XXX 
- $sr16 aka $vtxcnt: XXX 
- $sr17 aka $invoc: XXX 
— $sr18 aka $ydir: XXX 
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- $s124-$s127 aka $тасһіпе id0-$machine 143: XXX 
- 58:28 aka Saffinity: XXX 

- $5132 aka 54: XXX 

- $5133 aka $tidx: ХХХ 

- 58:34 aka $tidy: ХХХ 

- $5135 aka 547: XXX 

- 58:36 ака $launcharg: ХХХ 

- $sr37 aka $ctaidx: ХХХ 

- 58:38 aka $ctaidy: XXX 

- $sr39 aka $ctaidz: XXX 

— $sr40 aka $ntid: XXX 

- $5141 aka $ntidx: XXX 

- $5142 aka $ntidy: XXX 

- $sr43 aka $ntidz: XXX 

- $5144 aka $gridid: XXX 

- $5145 aka $nctaidx: XXX 

— $sr46 aka $nctaidy: XXX 

- $sr47 aka $nctaidz: ХХХ 

- $sr48 aka $swinbase: XXX 

- $sr49 aka $swinsz: XXX 

- $sr50 aka $smemsz: XXX 

- $sr51 aka $smembanks: XXX 
- $sr52 aka $lwinbase: XXX 

- $sr53 aka $lwinsz: XXX 

- $sr54 aka $lpossz: XXX 

- $sr55 aka $lnegsz: XXX 

- $sr56 aka $lanemask_eq: XXX 
- $sr57 aka $lanemask lt: XXX 
- $sr58 aka $lanemask_le: XXX 
- $sr59 aka $lanemask_gt: XXX 
- $sr60 aka $lanemask_ge: XXX 
- $sr64 aka $trapstat: XXX 

- $sr66 aka $warperr: XXX 

- $sr80 aka $clock: XXX 

- $5181 aka $clockhi: XXX 
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Todo: figure out and document the SRs 


Memory 


The memory spaces in Fermi ISA are: 


СП: code space. The only way to access this space is by executing code from it (there's no “read from code 
space" instruction). Unlike Tesla, the code segment is shared between all program types. It has three levels of 
cache (global, GPC, MP) that need to be manually flushed when its contents are modified by the user. 


cO[] - c17[]: const spaces. Read-only and accessible from any program type in 8, 16, 32, 64, and 128-bit chunks. 
Each of the 18 const spaces of each program type can be independently bound to a range of VM space (with 
length divisible by 256) or disabled by the user. Cached like СП. 


Todo: figure out the semi-special c16[]/c17[]. 


10: local space. Read-write and per-thread, accessible from any program type in 8, 16, 32, 64, and 128-bit units. 
It's directly mapped to VM space (although with heavy address mangling), and hence slow. Its per-thread length 
can be set to any multiple of 0x10 bytes. 


s[]: shared space. Read-write, per-block, available only from compute programs, accessible in 8, 16, 32, 64, and 
128-bit units. Length per block can be selected by user. Has a locked access feature: every warp can have one 
locked location in s[], and all other warps will block when trying to access this location. Load with lock and 
store with unlock instructions can thus be used to implement atomic operations. 


Todo: size granularity? 


Todo: other program types? 


ӘП: global space. Read-write, accessible from any program type in 8, 16, 32, 64, and 128-bit units. Mostly 
mapped to VM space. Supports some atomic operations. Can have two holes in address space: one of them 
mapped to s[] space, the other to ІП space, allowing unified addressing for the 3 spaces. 


АП memory spaces use 32-bit addresses, except g[] which uses 32-bit or 64-bit addresses. 


Todo: describe the shader input spaces 


Other execution state and resources 


There's also a fair bit of implicit state stored per-warp for control flow: 


Todo: describe me 


Other resources available to CUDA code are: 


$t0-$t129: up to 130 textures рег 3d program type, up to 128 for compute programs. 
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• $s0-$s17: up to 18 texture samplers per 3d program type, up to 16 for compute programs. Only used if linked 
texture samplers are disabled. 


* $g0-$g7: up to 8 random-access read-write image surfaces. 


* Up to 16 barriers. Per-block and available in compute programs only. А barrier is basically a warp counter: a 
barrier can be increased or waited for. When a warp increases a barrier, its value is increased by 1. If a barrier 
would be increased to a value equal to a given warp count, it's set to 0 instead. When a barrier is waited for by 
a warp, the warp is blocked until the barrier's value is equal to 0. 


Todo: not true for GK104. Not complete either. 


Instruction format 


Todo: write me 


Instructions 


Todo: write me 


Notes about scheduling data and dual-issue оп GK104+ 


There should be one “sched instructions" at each 0x40 byte boundary, i.e. one for each group of 7 “normal” instruc- 
tions. For each of these 7 instructions, “sched” containts | byte of information: 


0x00 : no scheduling info, suspend warp for 32 cycles 

0x04 : dual-issue the instruction together with the next one жж 

0x20 | n : suspend warp for n cycles before trying to issue the next instruction 
(0 <= n « 0x20) 

0x40 222 

0х80 222 


жж Obviously you can't use 0x04 on 2 consecutive instructions 


If latency information is inaccurate and you encounter an instruction where its dependencies are not yet satisfied, the 
instruction is re-issued each cycle until they are. 


EXAMPLE sched 0x28 0x20: inst. issuedl/inst executed = 6/2 sched 0x29 0x20: inst. 1issuedl/inst. executed = 5/2 
sched 0x2c 0x20: inst. issuedl/inst, executed = 2/2 for mov b32 $10 с0[0] set 5рО eq u32 5:0 Ox 1 


DUAL ISSUE 


General constraints for which instructions can be dual-issued: 
* not if same dst 


* not if both access different 16-byte ranges inside cX[] 
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* not if any performs larger than 32 bit memory access 

e a=b, b =c is allowed 

e g[] access can’t be dual-issued, Id seems to require 2 issues even for b32 

e f64 ops seem to count as 3 instruction issues and can’t be dual-issued with anything (GeForce only ?) 


SPECIFIC (a X b means a cannot be dual-issued with any of b) mov gpr X mov sreg X mov sreg add int X shift X 
shift, mul int, cvt any, ins, рорс mul int X mul int, shift, cvt any, ins, pope су any X cvt any, shift, mul int, ins, рорс 
ins X ins, shift, mul int, cvt any, popc pope X popc, shift, mul int, cvt any, ins set any X set any logop X slct X Id 1 X 
ld 1, ld s ld s X ld s, ld 1 


GF100 Fermi 3D objects 


Contents 


e СЕ100 Fermi 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


GF100 Fermi compute objects 


Contents 


• GF100 Fermi compute objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.9. PGRAPH: 2d/3d graphics and compute engine 265 


nVidia Hardware Documentation, Release git 


2.9.13 GK104 Kepler graphics and compute engine 


Contents: 


GK104 Kepler 3D objects 


Contents 


* GKI04 Kepler 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


GK104 Kepler compute objects 


Contents 


* GK104 Kepler compute objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


2.9.14 GM107 Maxwell graphics and compute engine 


Contents: 
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GM107 Maxwell 3D objects 


Contents 


e GM107 Maxwell 3D objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


GM107 Maxwell compute objects 


Contents 


* GM107 Maxwell compute objects 


— Introduction 


Todo: write me 


Introduction 


Todo: write me 


Maxwell CUDA processors 


Contents: 


Maxwell CUDA ISA 


Contents 


* Maxwell CUDA ISA 


- Introduction 
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— Instructions | 


Introduction 


This 


currently is not a complete reference of known functionality, but where behaviour not obvious from envy- 


dis/gm107.c can be documented. 


Some notes for reading this documentation: 


An instruction's docs is split into three sections, the forms text, the description and the behaviour text. 

The first operand is usually the destination. 

The behaviour text uses the notation SRC<n>/DST, while the forms text does not. 

REG<n> is a reference to a register. 

CB<n> is a reference to the contents of a constant buffer. 

U«b» «n» is a b-bit unsigned immediate value. 

S«b» «n» is а b-bit signed immediate value. 

Some subtleties may lie in an instruction's description if putting it in the behaviour text would be too verbose. 

add with carry(a, b) returns the sum of a and b using the carry flag, and writes the carry flag. 
— It does not use and/or set the carry flag if the appropriate instruction flags are not specified. 

Instruction flags in between [ and ] are optional. 

The order of the flags (even in between [ and ]) is what is expected by envyas. 

The “carry flag" or “condition code" is not a instruction flag, but a register. 


The terms "condition code" and "carry flag" are used interchangeably, depending on which is clearest. 


Instructions 


The instructions are roughly divided into the following groups: 


Integer Arithmetic Instructions 


Integer Arithmetic Instructions 


Contents 


* [nteger Arithmetic Instructions 
- Introduction 
— Common Flags 
* neg 


ж hO/hl 


x xX 
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* CC 


— Addition: iadd3 
- Multiply-add: xmad 


Introduction 
Common Flags 
neg 

Negate the operand. 
h0/h1 


An optional flag that can be either hO ог h1. With h1, the high 16 bits of the operand are used. With ҺО, the low 16 
bits are used. 


Use the condition code. 


cc 


Set the condition code. 


Addition: iadd3 


iadd3 [mode,x,cc] REGO [neg,h0/h1] ВЕС1 [neg,h0/h1] REG2 [neg,h0/h1] REG3 
iadd3 [x,cc] REGO [neg] REG1 [neg] CB2 [neg] REG3 
iadd3 [x,cc] REGO [neg] ВЕС1 [neg] S20 2 [neg] REG3 


Adds three integers. The flag mode may optionally be rs or 1s. 


switch (mode) { 


case rs: 
/* yes, the intermediate addition creates a 33-bit integer «*/ 
uint32 t intermediate = (uint33 t(SRC1) + uint33 t(SRC2)) >> 16; 
DST = add with carry(intermediate, SRC3); 
break; 


case 15: DST = add with carry(((SRCl + SRC2) << 16), SRC3); break; 
default: DST add with саггу((5КСІ + SRC2), SRC3); break; 
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Multiply-add: xmad 


xmad [srcl type,src2 type,psl,mrg,cmode,x,cc] REGO [h1] REG1 [hl] REG2 REG3 
xmad [srcl type,src2 type,cmode,x,cc] REGO [hl] REG1 [h1] REG2 СВЗ 

xmad [srcl type,src2 type,psl,mrg,cmode,x,cc] REGO [hl] REG1 [h1] CB2 REG3 
xmad [srcl type,src2 type,psl,mrg,cmode,x,cc] REGO [h1] БЕСІ 520 2 REG3 


Multiplies two 16-bit integers and adds a 32 bit integer, along with a bunch of other stuff. 


If one of ѕгс1 type or src2 type is set, the other must also be set. They сап be 516 u16,u16 516 ог516 
516. 


The flag cmode may optionally be clo, chi, csfu or cbcc. The cbcc mode may not be specified for the constant 
buffer forms. 


uint32 t p a = 5ЕСІ.Һ1 ? 58С1»»16 : SRCl&Oxffff; 
uint32 t p b 58С2.11 ? SRC2>>16 : SRC2&0xffff; 
if (srcl type == 516) p a = sign extend from 16 to 32(p a); 
if (src2 type == 516) p b = sign extend from 16 to 32(p b); 


uint32_t p = ра +» рЫ; 
if (psl) p <<= 16; 


uint32_t c = SRC3; 
switch (cmode) 


case clo: с = с & Oxffff; break; 
case chi: c = с >> 16; break; 
case cbcc: c += SRC2 << 16; break; 
case csfu: { 
if (p_a==0 || p_b==0) break; 
//v 8 0х80000000 -> as twos complement(v) < 0 
if (ра & 0х80000000) c -= 65536; 
if (р Б & 0x80000000) c -= 65536; 
break; 


DSTO = add with carry(p, с); 
if (mrg) DSTO = (DSTO & Oxffff) | (SRC2««16); 


2.9.15 Pipeline Bundles 


Contents 


* Pipeline Bundles 


Introduction 


Celsius/Kelvin/Rankine/Curie bundles 


Texture bundles 


— Register combiner bundles 
- КОР bundles 
- RASTER bundles 
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— Misc bundles | 


Introduction 


By its nature, every stage of the graphics pipeline processes a different kind of data — the format of packets sent 
between pipeline units varies greatly. However, there is a kind of data that is supported on most unit interconnections: 
pipeline bundles. Bundles are used for data that needs to be passed unchanged through many stages of the pipeline 
— most of them directly from the FE. Every unit in the pipeline will only recognize and act on a small subset of the 
bundles, and pass through all other bundles. 


Bundles have first appeared on Celsius, where they consist of a 6-bit bundle type and 32-bit bundle data. On Kelvin, 
the bundle type space has been reorganized and extended to 9 bits. On Tesla, bundle types have been reorganized again 
and extended to 16 bits. 


Most bundles are so-called “state bundles" — their purpose is to pass pipeline configuration data from the FE to all 
interested pipeline units. The pipeline units that need to know a particular piece of configuration data will watch for 
the corresponding state bundle, updating its internal configuration registers when such a bundle passes through. In 
some cases, units will recognize that no further units in the pipeline need a given state bundle and won't pass it any 
further, but usually state bundles travel unchanged from the FE right until the ROPs. 


Before Tesla, state bundles usually contained packed state — many pieces of configuration affecting related units were 
collected into a single bundle. The FE thus keeps a copy of the last value sent for every state bundle, which is 
visible through MMIO. Whenever a method is processed that changes a piece of configuration, the relevant bits in 
the correponding state bundle shadow register are updated, and the entire bundle is resubmitted through the pipeline. 
The shadow registers are also used for context-switching — to save pipeline configuration, it's enough to just dump 
the shadow registers. On restore, writing the shadow registers will automatically submit the given bundle down the 
pipeline, thus restoring the state of every unit involved. 


Since Tesla, state bundles usually correspond directly to class methods, and the FE doesn't need to keep track of most 
of them (though some are tracked in shadow registers for pre-launch state validation purposes). Instead, state bundles 
are context-switched by saving and restoring their copies kept on every involved pipeline unit. 


Other bundles are used to trigger some kind of action in a pipeline unit that is different from the main mode of operation 
(ie. rendering primitives): buffer clears, queries, and so on. These are called trigger bundles. 


Celsius/Kelvin/Rankine/Curie bundles 


Celsius | Kelvin | Rankine/Curie | Type Used by Name 

- 100[20] | 000[20] state-ish | RASTER? POLYGON. STIPPLE 
14 020[8] 020[8] state RC? RC FACTOR A 
15 028[8] | 028[8] state RC? RC FACTOR B 
10[2] 030[8] | 030181 state RC? RC IN ALPHA 
16[2] 038[8] | 038181 state RC? RC OUT ALPHA 
12[2] 040[8] 040[8] state RC? RC IN COLOR 
18[2] 048[8] 048[8] state RC? RC OUT COLOR 
- 050 050 state RC? RC CONFIG 

la 051 051 state RC? RC FINAL A 

10 052 052 state RC? RC FINAL B 

1с 053 053 state ROP? СОМЕІС A 

14 054 054 state ROP? STENCIL A 

le 055 055 state ROP? STENCIL B 

If 056 056 state ASSM,ROP? | CONFIG B 


Continued on next page 
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Table 12 — continued from previous page 


Celsius | Kelvin | Rankine/Curie | Type Used by Name 

- - 057 state RASTER? VIEWPORT OFFSET 

- - 058 state SHADER? PS OFFSET 

35* 059* 059 state ZCULL CLIPID ID 

315 05а” 05а state ZCULL CLIPID BASE 

32* 0505 050 state ZCULL CLIPID LIMIT 

33* 05c* 05c state ZCULL CLIPID OFFSET 

34* 05d* 05d state ZCULL CLIPID PITCH 

- 05e* 05e state RASTER? LINE STIPPLE 

- 05f? 051 state ROP? КТ ENABLE 

23 060 060 state RC? FOG COLOR 

- 061[2] 061[2] state FOG. COEFF 

2a 063 063 state ASSM POINT SIZE 

22 064 064 state RASTER? RASTER 

- 065 065 state SHADER? TEX SHADER, CULL MODE 
- 066 066 state SHADER? TEX SHADER, MISC 

- 067 067 state SHADER? TEX SHADER OP 

- 068 068 state 992 FENCE OFFSET 

- 069 - state TEX? TEX ZCOMP 

- - 069 state 

- 06a 06a state UNKIE68 

- 06b[2] 06b[2] state RC? RC FINAL FACTOR 

- 06d[2] 06d[2] state RASTER? CLIP HV 

- 000 06f state ROP? MULTISAMPLE 

- 003[3] 070[3] state SHADER? TEX UNKIO 

- 006[3] 073[3] state SHADER? TEX UNKII 

- 009[3] 076[3] state SHADER? ТЕХ UNKI13 

- 00c[3] 079[3] state SHADER? TEX UNKI2 

- 00f[3] 07c[3] state SHADER? ТЕХ UNKI5 

- 012[3] 07f[3] state SHADER? TEX UNKI4 

20 001 082 state ROP? BLEND 

21 002 083 state ROP? BLEND COLOR 

2b[2] 019[2] | 084[2] state RASTER? CLEAR HV 

- 01b 086 state RASTER? CLEAR, COLOR 

- - 087 state ROP? STENCIL C 

- - 088 state ROP? STENCIL D 

- - 089 state RASTER? CLIP. PLANE ENABLE 
- - 08b[2] state RASTER? VIEWPORT HV 

- - 08d[2] state RASTER? SCISSOR HV 

- 091[8] 091[8] state RASTER? CLIP RECT HORIZ 

- 099[8] 099[8] state RASTER? CLIP RECT VERT 

36 0а1 0а1 state ZCULL? Z CONFIG 

37 022 022 state ZCULL? CLEAR ZETA 

38 - - state ZCULL? UNK3FC 

27 023 0а3 state RASTER? DEPTH, RANGE FAR 
26 0a4 0a4 state RASTER? DEPTH, RANGE NEAR 
- 0а5[2] 0а5[2] state TEX? DMA, TEX 

- 0а7[2] 0а7[2] state IDX DMA VTX 

25 049 049 state RASTER? РОГҮСОМ OFFSET UNITS 
24 Oaa Oaa state RASTER? POLYGON_OFFSET_FACTOR 


Continued on next page 
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Table 12 — continued from previous page 


Celsius | Kelvin | Rankine/Curie | Type Used by Name 

- Oab[3] Oab[3] state SHADER? TEX_SHADER_CONST_EYE 

- Оае* - state 

- - Oaf state RANKINE_UNKOA40 

2d* Ob0* 050 state ZCULL ZCULL BASE 

2e* 0515 Obl state ZCULL ZCULL LIMIT 

2f* 0b2* 062 state ZCULL ZCULL OFFSET 

30* 053% 053 state ZCULL ZCULL PITCH 

- Ob4[4]* | Ob4[4] state KELVIN UNKIDCO 

- 0585 058 state KELVIN UNKIDBC 

- - 059 state IDX PRIMITIVE RESTART ENABLE 
- - Oba state IDX PRIMITIVE RESTART INDEX 
- - Obb state RASTER? TXC CYLWRAP 

- - Obc[8] state-ish | SHADER? PS PREFETCH DATA 

- - 0c4 state SHADER? PS CONTROL 

- - 0с5 state RASTER? TXC ENABLE 

- - 0c6 state? ???? apparently involved in clears 
- - 0c7 state RASTER? WINDOW OFFSET 

00[2] 089[4] 100[10] state TEX? TEX OFFSET 

04[2] 081[4] 110[10] state TEX? TEX FORMAT 

- 06114| 120[10] state TEX? TEX WRAP 

06[2] 073[4] 130[10] state TEX? TEX CONTROL 

08[2] 077[4] 140[10] state TEX? TEX_PITCH 

0a[2] 07b[2] - state TEX? TEX_UNK238 

0e[2] 07d[4] 150[10] state TEX? TEX FILTER 

0с121 085141 160[10] state TEX? TEX RECT 

- 003[4] 170[10] state TEX? TEX BORDER COLOR 

02[2] 08d[4] 180[10] state TEX? TEX PALETTE 

28[2] 016141 190[10] state TEX? TEX COLOR KEY 

- - 14с trigger? ???? apparently involved in clears 
- - 1f7 trigger? ОМКА08 

- - 118 trigger IDX PS PREFETCH TRIGGER 

3f* 119% 119 trigger ZCULL INVALIDATE ZCULL 

- 1% 1% trigger ? FENCE WRITE B 

- lfc lfc trigger ROP? ZPASS COUNTER READ 

- Ifd 1fd trigger ROP? ZPASS COUNTER RESET 
3e* Іғе* Ife trigger ZCULL CLEAR_CLIPID_TRIGGER 
3d* ? ? trigger ZCULL CLEAR ZCULL TRIGGER 


Texture bundles 


Todo: write me 


TEX OFFSET: A simple 32-bit texture offset. Should be aligned to 0x80 bytes. 
TEX FORMAT [NV10:NV20]: 
* bit 1: DMA 
-0:А 
– ЕВ 
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bit2: CUBE MAP 
bit 3: CELSIUS MTHD ТЕХ UNK258 [NV17:NV20] 
bit 4: ORIGIN ZOH 
— 0: CENTER 
- 1: CORNER 
bit 6: ORIGIN FOH 
bits 7-11: FORMAT 


bits 12-15: MIPS - number of mipmap levels 
bits 16-19: SIZE. S - log2 of texture width, if not RECT 
bits 20-23: SIZE T - log2 of texture height, if not RECT 
bits 24-26: WRAP S 
bit27: WRAP S CYL 
bits 28-32: WRAP T 
e bit31: WRAP T CYL 
On NV20, WRAP * have been moved to a new ТЕХ WRAP bundle. 
TEX FORMAT [NV20:]: 
* bit 1: DMA 

-0:А 

- ЕВ 
bit2: CUBE МАР 
bit3: BORDER TYPE [NV20:] 

- 0: INCLUDED 

- 1: CONST 
bit 4: ORIGIN. ZOH [NV20:NV30] 
bit 5: ORIGIN FOH [NV20:NV30] 
bits 6-7: MODE [NV20:NV30] 

- 1: ID 

— 2: 2D [also used for CUBE] 

- 3: 3D 
bits 8-14: FORMAT [NV20:NV40] 
bits 8-15: FORMAT [NV40:] 
bits 16-19: MIPS - number of mipmap levels [NV20:] 
bits 20-23: SIZE 5 - log2 of texture width, if not RECT 
bits 24-27: SIZE T - log2 of texture height, if not RECT 
bits 28-31: SIZE R - log2 of texture depth, if 3D 
FORMAT can be one of: 


274 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


0x00: 
0х01: 
0х02: 
0х03: 
0х04: 
0х05: 
0х06: 
0х07: 
0х08: 
0х09: 
ОхОа: 
OxOb: 
OxOc: 
ОхОе: 
OxOf: 
0х10: 
0х11: 
0х12: 
0х 13: 
0х 14: 
0х 15: 
0х 16: 
0х17: 
0х 18: 
0х19: 
Ох1а: 
Ox 15: 
Oxlc: 
0x19: 
Ох1а: 
0х 15: 
Ох 1с: 
0х 14: 
Oxle: 
Ox1f: 
0x20: 


??? 

922 

22? 

22? 

22? 

22? 

22? 

22? 

222 [:NV30] 

922 [:NV30] 

222 :NV30] 

22? 

222 DXT 

992 DXT 

222 DXT 

922 RECT 

922 RECT 

922 RECT 

922 RECT 

922 RECT 

922 RECT 

??? RECT 

922 RECT 

922 RECT 

922 RECT [NVI17: 
??? RECT [NV17: 
??? RECT[ 
992 RECT [NV17: 
??? [NV20:] 

222 [NV20:] 

??? RECT [NV20:] 
??? RECT [NV20:] 
??? RECT [NV20:] 
??? RECT [NV20: 
222 RECT [NV20: 
922 КЕСТ [NV20:] 


1 
1 
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0x24: 
0x25: 
0x26: 
0х27: 
0х28: 
0х29: 
Ох2а: 
Ox2b: 
Ох2с: 
0х24: 
Ох2е: 


Ox2f: 


0x30: 
0x31: 
0x32: 
0x33: 
0x34: 
0x35: 
0x36: 
0x37: 
0x38: ??? 
0x39: ??? 


???_RECT_DXT [NV20:] 
??? КЕСТ DXT [NV20:] 
??? RECT [NV20:] 
922 [NV20: 
922 [NV20: 
922 [NV20: 
9222 ZCOMP [NV20:] 
??? ZCOMP [NV20:] 
??? ZCOMP [NV20:] 
] 
[ 


[ 
] 
] 
] 


227 ZCOMP [NV20: 
979 RECT. ZCOMP [NV20:] 
227 RECT. ZCOMP [NV20:] 
227 ВЕСТ ZCOMP [NV20:] 
227 RECT. ZCOMP [NV20:] 
22? [NV20:] 

22? [NV20:] 

??? ВЕСТ DXT [NV20:] 
??? ВЕСТ [NV20:] 

??? ВЕСТ [NV20:] 

22? ВЕСТ [NV20:] 


Ox3a: 292 
Ox3b: 222 


Ox3c: ? 


0х34: 
Ox3e: 
Ox3f: 


0x40: 
Ox41: 
0x42: 
0x43: 
0х44: 
0х45: 
0х46: 
0х47: 


E 
3 
3 
2, 
< 
N 


1 
22? ВЕСТ | МУ20:| 
22? ВЕСТ [NV20:] 
??? ВЕСТ [NV20:] 
999 ВЕСТ [NV20:] 
??? ВЕСТ [NV20:] 
22? [NV25:] 
799 ВЕСТ [NV25:] 
22? [NV25:] 

799 [NV25:] 

799 ВЕСТ [NV25:] 
??? ВЕСТ [NV25:] 
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ТЕХ. 


On Rankine, WRAP * CYL have been moved to a new ТХС CYLWRAP bundle. 


0x48: 222 RECT [NV25:] 
0x49: 222 [NV25:] 
0х4а: 222 RECT [NV30:] 
Ox4b: 222 КЕСТ [NV30:] 
Ox4c: ??? RECT [NV30: 
0х44: 222 RECT [NV30:] 
Ox4e: ??? [NV30:] 
WRAP [NV20:]: 
bits 0-2: WRAP_S 
bit 4: WRAP_S_CYL [NV20:NV30] 
bits 4-7: ANISO_MIP_FILTER_OPTIMIZATION? [NV30:] 
bits 8-10: WRAP_T 
bit 12: WRAP_T_CYL [NV20:NV30] 
bit 12: EXPAND_NORMAL [NV30:] 
bits 13-14: RANKINE TEX WRAP UNK24 [NV30:] 
- 0: ??? 
- 1: ??? 
- 2:22? 
bits 16-18: WRAP К 
bits 19-23: FILTER. OPT TRILINEAR [NV30:] 
bits 24-27: GAMMA DECREASE FILTER? [NV30:] 
bits 28-31: ZCOMP [NV30:] – on NV20, this was a separate bundle instead. 
bit 20: WRAP К СҮТІ [NV20:NV30] 
bit24: WRAP Q CYL [NV20:NV30] 


WRAP can be one of: 


TEX | 


1: REPEAT 

2: MIRRORED REPEAT 

3: CLAMP TO EDGE 

4: CLAMP TO BORDER 

5: CLAMP 

CONTROL: 

bit 0: COLOR KEY ENABLE? 

bits 1-3: ??? 

bits 4-5: ANISOTROPY 

bits 6-17: МАХ ТОР, in 4.8 fixed-point format 
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* bits 18-29: MIN LOD, in 4.8 fixed-point format 
* bit 30: ENABLE - if set, this texture 1s active 
* bit 31: 222 [NV40:] 
TEX PITCH: 
• bits 0-1: 51 № [NV30:] 
- 0: W 
-12 
-2 Y 
- 3: X 
e bits 2-3: S1_Z [NV30:] 
e bits 4-5: S1_Y [NV30:] 
e bits 6-7: S1_X [NV30:] 
* bits 8-9: S0. W [NV30:] 
* bits 10-11: 50 Z [NV30:] 
e bits 12-13: 50 Ү [NV30:] 
* bits 14-15: 50 X [NV30:] 
* bits 16-31: PITCH 
ТЕХ UNK238 (on Kelvin, only applies for first 2 textures) [:NV30]: 
* bits 0-31: ??? 
TEX FILTER: 
bits 0-12: LOD BIAS, signed number in 5.8 fixed-point format 
bits 13-15: ТЕХ FILTER. UNK13 [NV20:] 
0: UNKO 
1: UNK1 
2: UNK2 
3: UNK3 [NV25:] 
bits 16-21: MINIFY [NV20:] 
bits 24-27: MAGNIFY [NV20:] 
bit 28: SIGNED B [NV20:] 
bit 29: SIGNED С [NV20:] 
bit 30: SIGNED R [NV20:] 
bit 31: SIGNED А [NV20:] 
bits 24-26: MINIFY [:NV20] 
bits 28-30: MAGNIFY [:NV20] 
MINIFY can be one of: 
* 1: NEAREST 


е , e. , e. e. e. e. , , e. , e. 
-- ~ — = 
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: LINEAR 
: NEAREST MIPMAP NEAREST 
: LINEAR. MIPMAP NEAREST 
: NEAREST MIPMAP LINEAR 
: LINEAR. MIPMAP LINEAR 

e 7: 2?? [NV20:] 
And MAGNIFY can be: 

* 1: NEAREST 

* 2: LINEAR 

e 4: ??? [NV20:] 
TEX RECT: 

* bits 0-10: WIDTH [:NV20] 

* bits 0-12: WIDTH [NV20:] 

* btis 16-26: HEIGHT [:NV20] 

* btis 16-28: HEIGHT [NV20:] 
TEX PALETTE: 

e bit0: DMA 

-0:А 
– ЕВ 

* bits 2-3: 222 [NV20:] 

* bits 6-31: OFFSET >> 6 
TEX ZCOMP [NV20:NV25]: 

* bits 0-2: MODE - common for all textures, same values as ALPHA. FUNC 
TEX ZCOMP [NV25:NV30]: 

* bits 0-2: ТЕХО MODE 

* bits 3-5: ТЕХІ MODE 

* bits 6-8: TEX2 MODE 

* bits 9-11: TEX3 MODE 
On NV30, this bundle is gone and ZCOMP mode is in ТЕХ WRAP instead. 


. 


Register combiner bundles 


Todo: write me 


* RC FACTOR A 
* RC FACTOR B 
* RC IN ALPHA 
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RC OUT ALPHA 
RC IN COLOR 

RC OUT COLOR 
RC CONFIG 

RC FINAL A 

RC FINAL B 

FOG. COLOR 

RC FINAL FACTOR 


ROP bundles 


Note: СОМЕІС A, STENCIL A and STENCIL B predate bundles – they first appeared on NV4 as plain MMIO 
registers. These early versions are described here as well. 


CONFIG A: 
• bits 0-7: ALPHA. REF [:NV40] – moved to its own bundle on NV40 
* bits 8-11: ALPHA, FUNC 
On МУ4:МУ10, the values are: 


1: NEVER 

: LESS 

: EQUAL 

: LEQUAL 

: GREATER 

: NOTEQUAL 
: GEQUAL 
8: ALWAYS 


м с л A W N 


On МУ10 and up, they are: 


0: NEVER 
1: LESS 

2: EQUAL 

3: LEQUAL 

4: GREATER 
5: NOTEQUAL 
6: GEQUAL 

7: ALWAYS 


* bit 12: ALPHA. FUNC ENABLE 
• bit 14: DEPTH. TEST ENABLE 
• bits 16-19: DEPTH. FUNC - has same values as ALPHA FUNC 
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bits 20-21: CULL. FACE [NV4:NV10] 
- 1: NONE 
- 2: FRONT 
- 3: BACK 


Since there is no FRONT FACE setting on NV4, FRONT is always CW. This was moved to RASTER bundle 
on Celsius. 


bit 22: DITHER. ENABLE 
bit 23: DEPTH. PERSPECTIVE ENABLE [:NV40] 
bit24: DEPTH WRITE ENABLE 
bit 25: STENCIL WRITE ENABLE [:NV40] 
bit 26: COLOR. МА5К A [:NV40] — moved to its own bundle on NV40, along with the following 3 bits. 
bit 27: COLOR MASK К [:NV40] 
bit 28: COLOR MASK С [:NV40] 
bit 29: COLOR MASK B [:NV40] 
bits 30-21: Z FORMAT [NV4:NV10] 
- 1: FIXED 
- 2: FLOAT 
This was moved to RASTER bundle on Celsius. 
bits 30-31: KELVIN CONFIG UNK28 [NV20:NV25] 


- 0: ??? 


-1:??? 
- 2:22? 
This was moved to CONFIG. B on NV25. 
e bit 31: CELSIUS UNK3FS8 [NV17:NV20] 
* bit 31: 272 [NV34, NV40:] 
STENCIL A: 
* bit 0: STENCIL ENABLE 
* bit 1: STENCIL BACK ENABLE [NV30:] 
* bits 4-7: 5ТЕМСП, FUNC - has same values as ALPHA. FUNC 
* bits 8-15: STENCIL FUNC REF 
* bits 16-23: 5ТЕМСП, FUNC MASK 
* bits 24-31: 5ТЕМСП, MASK 
STENCIL B: 
* bits 0-3: STENCIL OP FAIL 
- 1: KEEP 
- 2: ZERO 
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- 3: REPLACE 
— 4: INCR 
- 5: DECR 
— 6: INVERT 
- T: INCR WRAP 
- 8: DECR WRAP 
* bits 4-7: STENCIL OP ZFAIL 
e bits 8-11: STENCIL ОР ZPASS 
e bits 12-15: 222 [NV34, NV40:] 
STENCIL C [NV30:]: 
* bits 0-7: 5ТЕМСП, BACK. MASK 
* bits 8-11: STENCIL BACK OP ZPASS 
* bits 12-15: STENCIL BACK OP ZFAIL 
* bits 16-19: STENCIL BACK OP FAIL 
STENCIL D [NV30:]: 
e bits 0-7: STENCIL BACK FUNC REF 
* bits 8-15: STENCIL BACK FUNC MASK 
* bits 16-19: STENCIL BACK FUNC 
CONFIG B: 
* bit 0: PROVOKING VERTEX 
- 0: LAST 
– 1: FIRST 
e bit 1: POINT SPRITE ENABLE [NV25:] 


Todo: why is POINT SMOOTH ENABLE aliased here? 


e bit2: CELSIUS СОМЕІС UNK24 
* bits 3-4: POINT SPRITE К MODE [NV25:] 

- 0: ZERO 

-ER 

- 2:5 
e bit 4: 222 [NV10:NV20] – no method appears to affect this bit 
e bit 5: SPECULAR ENABLE - this is also stored in ХЕ MODE. 
* bit6: TEXTURE PERSPECTIVE ENABLE 
• bit 7: SHADE MODE 

— 0: FLAT 

- 1: SMOOTH 
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bit 8: FOG ENABLE - this is also stored in XF MODE. 

bit9: POINT PARAMS ENABLE - this is also stored in XF MODE. 
bits 10-15: 222 [NV40:] 

bits 10-13: CELSIUS. CONFIG UNKS [NV10:NV30] 


- 0: ??? 


- 1:22? 


bits 14-15: CELSIUS_CONFIG_UNK28 [NV17:NV20] 
bits 16-18: FOG MODE [NV20:] 

— 0: LINEAR 

- 1: EXP 

- 3: EXP2 

- 4: UNK 0804 

- 5: UNK 0802 

- 7: ОМК 0803 
The low bit of this is also stored in XF MODE. On Celsius, fog mode was stored in FE3D MISC instead. 
bit 19: ??? [NV40:] 
bit 20: ZPASS COUNTER ENABLE [NV20:] 
bit 21: ??? [NV40:] 
bits 24-27: POINT SPRITE COORD REPLACE [NV25:] 
bits 28-30: KELVIN CONFIG UNK28 [NV25:] 


- 0: 292 


- 1: 227 
- 2: ??? 
- 3:99 
This was moved from CONFIG А. 
e bit 31: KELVIN UNKAOC [NV25:] 
BLEND: 
* bits 0-2: BLEND EQUATION 
- 0: SUBTRACT 
- 1: REVERSE SUBTRACT 
ADD 
MIN 
MAX 
: UNKF005 [NV20:] 
: UNKF006 [NV20:] 
: UNKF007 [NV25:] 


1 
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* bit3: BLEND FUNC ENABLE [:NV40] 
* bits 4-7: BLEND FACTOR SRC 0 


0х0: 
0х1: 
0х2: 
0х3: 
0х4: 
0х5: 
0хб: 
0х7: 
0х8: 
0х9: 
Оха: 
Охс: 
Oxd: 


Oxe: 


Oxf: 


* bits 8-11: 
• bits 12-15: COLOR, ІОСІС OP OP [NV15:] 


0х0: 
0х1: 
0х2: 
0х3: 
0х4: 
0х5: 
0хб: 
0х7: 
0х8: 
0х9: 
Оха: 
Oxb: 
: COPY INVERTED 
: OR INVERTED 

: NAND 

Oxf: 


ZERO 

ONE 

SRC COLOR 

ONE MINUS SRC COLOR 
SRC ALPHA 
ONE MINUS SRC ALPHA 
DST ALPHA 
ONE MINUS DST ALPHA 
DST COLOR 
ONE MINUS DST COLOR 

SRC ALPHA SATURATE 
CONSTANT COLOR 

ONE MINUS CONSTANT COLOR 
CONSTANT ALPHA 

ONE MINUS CONSTANT ALPHA 
BLEND FACTOR DST 0 


CLEAR 
AND 

AND. REVERSE 
COPY 

AND INVERSE 
NOOP 

XOR 

OR 

NOR 

EQUIV 

INVERT 

OR. REVERSE 


SET 


• bit 16: COLOR. LOGIC OP ENABLE [NV 15:] 
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* bits 17-19: BLEND EQUATION 1 [NV40:] 
* bits 20-23: BLEND FACTOR SRC 1 [NV30:] 
* bits 24-27: BLEND FACTOR БТ 1 [NV30:] 
* bit 28: BLEND FUNC ENABLE [NV40:] 
* bits 29-31: 222 [NV40:] 
BLEND COLOR: 
* bits 0-7: B 
* bits 8-15: G 
* bits 16-23: R 
* bits 24-31: A 
MULTISAMPLE: 
e ри 0: MULTISAMPLE ENABLE 
* bit4: ALPHA TO COVERAGE 
* bit8: ALPHA TO ONE 
* bits 16-31: SAMPLE COVERAGE 


RASTER bundles 


RASTER: 
* bits 0-1: POLYGON MODE FRONT 

- 0: FILL 

– 1: POINT 

- 2: LINE 
bits 2-3: POLYGON MODE BACK 
bit 4: POLYGON STIPPLE ENABLE [NV20:NV25] 
On NV25, this was moved to LINE STIPPLE bundle. 
bit 4: 222 [NV25:NV30] 
bit 4: RANKINE UNK1450 UNK31 [NV30:NV40] 
bit 5: DEPTH CLAMP UNKS [NV20:] 
bit 6: POLYGON OFFSET POINT ENABLE 
bit 7: POLYGON OFFSET LINE ENABLE 
bit 8: POLYGON OFFSET FILL ENABLE 
bit 9: POINT SMOOTH ENABLE [:NV30] 
bit 10: LINE SMOOTH ENABLE 
bit 11: РОГҮСОМ SMOOTH ENABLE 
bits 12-20: LINE WIDTH 
bits 21-22: CULL FACE 
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- 1: FRONT 

- 2: BACK 

- 3: FRONT AND BACK 
* bit 23: ЕКОМТ FACE 

- 0: CW 

- 1: CCW 


• bit 24: LIGHT TWO SIDE ENABLE [NV20:] – also stored in ХЕ MODE 


e bits 25-27: CELSIUS MTHD UNK3FO0 [NV20:] 
- 0: UNKO 
- 1: UNKI 
- 2: ОМК2 
- 3: UNK3 
- 4: UNK4 
- 7: UNKOF 
e bits 26-27: CELSIUS MTHD UNK3FO0 [NV10:NV20] 
- 0: UNKO 
- 1: UNKI 
- 2: ОМК2 
- 3: UNK3 
* bit 28: CULL FACE ENABLE 
• bit 29: Z FORMAT 
— 0: FIXED 
- 1: FLOAT 
• bits 30-31: CELSIUS MTHD UNK3FS$ [NV10:NV20] 
• bit 30: DEPTH CLAMP UNKO [NV20:] 
e bit31: CLIP RECT MODE [NV20:] 


- 0: ??? 


- 1: 277 
Before NV20, this was stored in FE3D MISC. 
LINE STIPPLE: 
* bit 0: POLYGON STIPPLE ENABLE 
e bit 1: LINE STIPPLE ENABLE 
* bits 8-15: LINE STIPPLE FACTOR 
* bits 16-31: LINE STIPPLE PATTERN 
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Misc bundles 


POINT SIZE: 


On NV10:NV25, this is a 9-bit fixed-point number — 6 integer bits and 3 fractional bits. On NV25:, it is a 
float32. 


2.9.16 XF: The vertex transform & lighting engine 


Contents: 


XF overview 


Contents 


* ХЕ overview 

- Introduction 

- Structure and operation 

— IDX2XF: the command interface 
ж IDX command wrapping 

— VAB: vertex assembly buffer 
* Celsius 
* Kelvin and up 
* The passthrough slot 


* RDI access 


* VAB command 


Introduction 


XF is a PGRAPH unit responsible for processing vertices before they are sent to the rasterizer. It first appeared on 
МУ10- before that, there was no transform engine, and the user supplied raw vertex data directly to the rasterizer. 
On G80, it has been replaced with unified shader architecture. Curiously, it has also been transplanted for use on 
pre-Kepler Tegra GPUs. 


The following versions of XF exist: 


1, NV10: the original incarnation of ХЕ. It is accompanied by the lighting engine, LT. Together, they perform 
fixed-function transform & lighting on incoming vertices. Supported features: 


* computes eye-space, clip-space and window-space position 
* can transform via a weighted combination of two matrices 
* supports several texgen modes: 

— eye linear 


— object linear 
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sphere map 


reflection map 


normal map 


emboss map 
* performs texture matrix multiplication 


* performs lighting calculations, making final primary and secondary colors out of position, normal, and 
input colors. Infinite, local, and spot lights are supported. 


* computes or passes the fog coordinate, with radial or planar distance calculations 
* computes the point size based on distance 
* all of the above can be disabled in favor of a simple bypass mode 

2. NV15: Bugfix version of NV10. 


3. NV20: Introduces support for programmability, aka vertex shaders. If enabled, fixed function processing is 
disabled, and XF instead performs operations according to a user-provided program. Other features include: 


* 16 шрш attributes that can be arbitrarily assigned when in programmable mode 


* two-sided lighting is supported — all lighting calculations can be performed twice, with different parame- 
ters, outputing two sets of primary and secondary colors. 


* weighting supports up to 4 matrices and 4 weights 
* 4 sets of output texture coordinates are supported, and each set now includes 4 components. 


* more flexibility in light material specification (every material property can be independently assigned to 
primary or secondary color) 


4. NV25: Includes two XF units on GPU, for double processing power. Also has some minor changes in context 
layout. 


5. NV30: 


* fixed-function viewport transform can now be performed in addition to programmable processing, avoding 
the need to include it in program manually 


* some fixed-function geometric calculations have been moved from LT to XF, for greater precision 
* anew Rankine ISA (a proper superset of Kelvin ISA), featuring: 

— condition code register and conditional execution 

— branching and subroutine calls 

— two address registers, which are now 4-component vectors 

— transcendential functions with reasonable precision 

— some minor new instructions 

— “take absolute value" modifier on all sources 

— bumped code and const memory size 


* Kelvin ISA is supported as a compatibility mode, by converting instructions to the new format as they are 
uploaded 


* 8 sets of output texture coordinates are supported 


* changed ordering of input and output attributes 
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* up to 6 clip distances can be output by user program, or computed by fixed-function hardware 
* bypass mode has been removed (but can be trivally emulated by a simple vertex program) 
* to prevent infinite loops, a configurable timeout was added 
6. NV34: Minor revision, removing support for alternate light attenuation mode 
7. NV40: 
* LT is no longer present, and all fixed-function work is now performed on the main XF engine 
* Kelvin ISA is no longer supported 
* Rankine ISA is supported as a compatibility mode 
e a new Curie ISA is introduced, which is not a proper superset of the previous two: 


— limitted texture lookup capability (only unfiltered linear 2D FP32 textures are supported) 


second condition code register 
— address registers can be pushed/popped on the stack 


— indirect addressing for inputs and outputs 


saturation modifier on outputs 


* programs are stored internally in a special native ISA which is a proper superset of both Rankine and Curie 
ISAs 


* flexible mapping of output array to atttributes 
* XF state is now specified by pipeline bundles, like most other pipeline state - XFMODE is gone 
* individual XF units are now called VPEs are are more independent of each other 

8. NV41: Rankine compatibility has been removed: 
* the fixed-function mode is completely gone 
* Rankine ISA is no longer supported 
* Curie ISA is now used directly as the native ISA 

9. NV43: Shortened XFCTX from 0x220 words to Ox1d4 words. 

10. NV44: unknown changes from NV43. 


11. Tegra: derived from Curie, but not much known. 


Structure and operation 


The XF complex is in the main pipeline after the IDX complex (for Kelvin, this means after the FD unit) and before 
the VTX complex (aka the post-transform cache). It is made of the following parts: 


1. IDX2XF: Input interface from the IDX complex (for Kelvin, from the FD unit). ХЕ receives all sorts of com- 
mands here. 


2. XF2VTX: Output interface to the VTX complex. XF outputs processed vertices and passthrough data here. On 
Celsius, also used to implement state readback for context switching. Note that no commands are emitted on 
this interface — VTX instead takes commands directly from the IDX complex by a side FIFO (IDX2VTX) that 
bypasses the XF complex. Data will only be consumed from here by VTX when it's told what to expect via the 
IDX2VTX interface. 
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. VAB: vertex attribute buffer. Serves as assembly space for data received on the IDX2XF interface. Has one 


128-bit slot for every input vertex attribute, plus one extra “passthrough” slot for assembling state updates. Data 
goes from here to IBUF ог XFPR. 


9. 


. XFMODE [NV10:NV40] or bundle [NV40:] storage: Remembers the control bits for the whole ХЕ complex. 
. One or more VPEs, which do the main load of vertex processing. Each one has: 

1. 
2. 


XFPR [NV20:]: RAM containing user programs. Before NV40, shared between all VPEs. 


XFCTX: RAM containing parameters for fixed-function processing and user programs. Made of 4-element 
vectors of 32-bit floats. Before NV40, shared between all VPEs. 


. Several copies of input/output buffers (6 copies on NV10:NV40, ??? on NV40:), one for each inflight 


vertex: 
1. IBUF: contains input attributes of the vertex 
2. TBUF: contains output attributes of the vertex (at least the subset computed before LT). 


3. WBUF [NV10:NV30]: contains outputs to be consumed by the LT unit for lighting calculations, made 
of 3-element vectors of 22-bit floating-point numbers. 


4. VBUF [NV10:NV30]: a second buffer like WBUF. 

5. UBUF [NV30:NV40]: like WBUF/VBUF on earlier GPUs, but now contains 5-element vectors. 
6. STPOS [NV20:NV40]: a shadow copy of the first output attribute. 

7. SIPOS [NV25:NV40?]: a shadow copy of the first input attribute 2272 


. XFREG: Temporary register file. 


. Control unit — contains PC, condition code, address registers, call stack, and fixed-function program se- 


quencer. Can control processing of up to 3 vertices at a time, in SMT fashion. 


. MLU: the multiplication execution unit. Can do 4 32-bit floating-point multiplies every cycle. 


. ALU: the addition execution unit. Can do 3 [NV10:NV20] or 4 [NV20:] 32-bit 2-input floating-point 


sums, or a single 4-input sum every cycle. Can also do comparisons and other simple operations. 


. ILU: the inverse execution unit. Can do one approximate reciprocal or reciprocal square root per two 


cycles. On ХУ20:, can also do low-precision exponential and logarithm approximations. 


MFU [NV30:]: the multi-function unit. Can compute EX2, LG2, SIN, COS with reasonable precision. 


6. The LT unit [NV10:NV40], computing final vertex colors іп fixed-function mode (as well as point size and fog 
before NV30). Uses a lower-precision 22-bit floating point format. Made of: 


1. 


л Rw N 


LTCTX: RAM containing parameters for fixed-function processing (like XFCTX). Made of 3-element 
vectors of 22-bit floats. On NV25:NV30, split into two RAMs: LTCTX_A and LTCTX_B. 


. Control unit — steps through the LT microcode, processing up to 3 vertices at a time in SMT fashion. 


MLU: can perform 3 float multiplications per cycle. 


ALU: can perform 3 float additions or one 3-input sum per cycle. 


. MACO and МАС2: perform scalar float multiply-accumulate operations. On NV30:, MACO can only do 


accumulate (no multiplication). 


. LTCO (for MACO) [NV10:NV30] and LTC2 (for MAC2): RAMs containing multiplication factors for the 


MACs. Made of 22-bit floats. 


. LTC1 (for MACO) and LTC3 (for MAC2): RAMs containing additive factors for ће MACs. Made of 


22-bit floats. 
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8. ILU: performs very approximate reciprocal, reciprocal square, and some misc operations. 
9. LTREG: the temporary register file. 


The VAB, ХЕСТХ, XFPR, LTCTX, and ГТС“ RAMs need to be context-switched. On Celsius, this is done via the 
readback functionality. On Kelvin and Rankine, they can be accessed via the RDI interface (done automatically by the 
hardware context switch). On Curie, they can be context-switched by the context microcode. 


All input/output and computation is performed on 32-bit or 22-bit floats — vertex attributes read from different formats 
are converted by IDX, and output attributes that require different formats are converted by VTX. The 32-bit floats are 
in IEEE single-precision format with some minor modifications: 


* denormals are not supported (and are considered equal to 0). 


* there is no distinction between QNaNs and SNaNs (since there are no traps іп ХЕ, all NaNs are effectively quiet). 
Whenever а NaN is created, the value Ox7££f£ffff is used. 


The 22-bit float format is used by computations in the LT unit, and works like the 32-bit float format with the low 10 
bits cut off (and assumed to be 0). 


Todo: NV25, NV30 have RAMs unaccounted for. 


Todo: Curie still has switchable RAMs unaccounted for. 


IDX2XF: the command interface 


IDX2XF is the input interface to ХЕ. IDX (or FD on Kelvin) can perform the following operations here: 


* write command: contains a 4-bit command type, an address (10 to 14-bit, depending on GPU) and a 32-bit or 
64-bit payload. Depending on the address, can update a piece of ХЕ state, request a data passthru to ХЕ2УТХ 
interface, or start a vertex state program. 


* read command [Celsius only]: contains a command type and an address, like a write command. Requests a 
readback of a piece of state to the XF2VTX interface. Used to implement context switching (badly), not used 
otherwise. 


* vertex trigger: starts processing a vertex, which will be output on the XF2VTX interface when fully processed. 
The addresses for commands are usually constructed as follows: 
* bits 0-1: always 0 (ie. all addresses are word-aligned). 


* bits 2-3: select a 32-bit word in a 128-bit vector. 0 is the highest word (or the X component), while 3 is the 
lowest word (or the W component). 


* bits 4-9 [NVIO:NV20], 4-11 [NV20:NV30], 4-12 [NV30:NV40], or 4-13 [NV40:]: select the 128-bit vector in 
a space. 


Read commands always target a 32-bit word, which will be read and delivered to XF2VTX interface. If the address is 
not valid for reading, XF will ignore the read command and deliver nothing to VTX. This will cause VTX to hang, in 
turn hanging FE3D, the PCI bus, the CPU, and the whole machine. Don't do that. 


Write commands can target a 32-bit word, or an aligned pair of 32-bit words. Since XF internal paths are mostly 
128-bit wide, several write commands are usually needed to perform a single operation. Thus, for most commands, 
writing to words 0-2 merely store the payload in the VAB passthrough slot, while writing to word 3 completes the 
128-bit vector in the VAB and send it downstream. 
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Note that XF is, in many ways, a big-endian creature (though not consistently so). Since most of the GPU follows 
little-endian design, this leads to things looking reversed in many places (in particular, when RDI is accessed). You 
have been warned... 


The following command types exist: 


0x0: NOP. Writes store the payload in VAB passthrough slot and do nothing. Not readable. 
0х1: VAB. Writes or reads VAB words. Used by IDX to upload input vertex attributes. 


0х2: XFPR [NV20:]. Writes program instructions to ће XFPR RAM (possibly with ISA encoding conversion), 
assembling them in VAB. 


0х4: PARAM [NV20:NV41?]. Writes the VAB passthrough slot, does nothing else. Used together with RUN 
command to pass a parameter to a vertex state program. 


0х5: PASSTHRU. Passes its payload through VAB, IBUF and TBUF to the XF2VTX interface. This command 
is used by IDX along with the BUNDLE command on the IDX2VTX interface to send bundles to the VTX 
complex. Using it without the accompanying IDX2VTX command will desync and hang VTX, so don’t do that. 
Not readable. 


0x6: RUN [NV20:NV41?]. Starts execution of a vertex state program, copying its parameter from the 
passthrough VAB slot to the IBUF. Meant to be used with the PARAM command. The low bits of the pay- 
load contain starting PC of the vertex state program. 


0x7: MODE [NV10:NV40]. Assembles a vector and sends it to the internal XFMODE storage. Not readable. 
0x8: XTRA [NV30:NV41]. Assembles a vector and sends it to the extra XFPR RAM slots. 

0x9: XFCTX. Assembles a vector and sends it through IBUF to XFCTX. Readable. 

Оха: LTCTX. Assembles a vector and sends it through IBUF and WBUF/VBUF to LTCTX. Readable. 

Oxb: LTCO [NV10:NV30]. Goes through IBUF and WBUF/VBUF to LTCO. Readable. 

Охс: LTCI. Goes through IBUF and WBUF/VBUF/UBUF to LTC1. Readable. 

Oxd: LTC2. Likewise. 

Oxe: LTC3. Likewise. 


Oxf: SYNC. Performs a full XF barrier — waits for all pending vertices to be processed before processing any 
more commands. Not readable. 


XF commands will be emitted by IDX in the following circumstances: 


whenever vertex data is submitted by any means (through vertex buffers, inline data, or immediate mode), the 
corresponding VAB write command will be sent to XF. 


whenever a bundle command is processed by IDX, the bundle will be submitted as payload in the PASSTHRU 
command, and a corresponding bundle token will be emitted on IDX2VTX interface. 


a “submit XF command” IDX command is received on the FE2IDX interface, either from method execution or 
from the PIPE MMIO register. 


Todo: None of the above is certain on Curie. 


IDX command wrapping 


The FE can submit commands to XF by wrapping them in IDX commands and sending them on the FE2IDX interface. 
When IDX sees such a wrapped command, it will be unwrap it at the last stage of processing and emit it on the IDX2XF 
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interface. This functionality is used by FE when executing methods that update XF context, and can be used by the 
driver directly through the PIPE MMIO register as well. 


On Celsius, the wrapped command addresses are: 
* bits 0-9: XF address 
* bits 10-13: XF command type 
* bit 14: set to 1 (identifies wrapped XF command). 
On Kelvin: 
* bits 0-11: XF address 
* bits 12-15: XF command type 
e bit 16: set to 1. 
On Rankine: 
* bits 0-12: XF address 
* bits 13-16: XF command type 
* bit 17: setto 1. 
On Curie: 
* bits 0-13: XF address 
* bits 14-17: XF command type 


* target code: set to 3? 


Todo: Figure out how this works on Curie. 


VAB: vertex assembly buffer 


VAB is the front gate to the XF complex. Its purpose is twofold: 


]. Keeping track of the last submitted value of every input vertex attribute, whether it comes from immediate data, 
inline data, or vertex buffer. 


2. Assembling 32-bit or 64-bit input words into 128-bit vectors [NV10:NV40]. 


Whenever IDX signals that a vertex is to be processed, the contents of the VAB (except for the passthrough slot) are 
copied to an IBUF slot for processing, and data for the next vertex can be loaded to the VAB while XF is working on 
the previous one(s) in IBUF. 


Celsius 


On Celsius, VAB is made of 8 128-bit vectors, which are in turn made of 4 32-bit words. The first 7 vectors correspond 
more or less to the first 7 vertex attributes recognized by IDX, while the last one is special: 


* 0: OPOS, the object position. 
* 1: COLO, the primary color. The X, Y, 7, W components correspond to К, С, B, A components of the color. 


e 2: COLIF, the secondary color and fog coordinate. The first three components (X, Y, Z) correspond to R, G, B 
components of the secondary color, while component W corresponds to the fog factor. 
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* 3: TXCO, the texture О coordinates. 
* 4: TXCI, the texture 1 coordinates. 
* 5: NRML, the normal. Component W is effectively unused. 


* 6: WGHT, the weight (used for transform matrix interpolation), stored in component X. Components Y, Z, W 
are effectively unused. 


* 7: PASS, the passthrough slot, used to assemble full vectors for commands other than VAB. 


Kelvin and up 


On Kelvin and Rankine, VAB is made of 17 128-bit vectors: 
* 0-15: Generic input vertex attributes, corresponding directly to the ones used by IDX. 
* 16: PASS, the passthrough slot. 


On Curie, VAB is made of 16 128-bit vectors, corresponding directly to the input vertex attributes (there is no 
passthrough slot). 


If the fixed function transformation is used on Kelvin, the input attributes have the following interpretation: 
* 0: OPOS. 
* 1: WGHT, a vector of up to 4 weights used for transform matrix interpolation. 
* 2: NRML (only X, Y, Z are used). 
• 3: COLO. 
* 4: COLI (only X, Y, Z are used). 
* 5: FOGC, the fog coordinate (only X is used). 


6-8: not used. 
* 9-12: TXCO-TXC3, the texture coordinates. 
13-15: not used. 


On Rankine and Curie, the interpretation for fixed-function is: 
* 0: OPOS. 
* 1: WGHT. 
* 2: NRML. 
• 3: COLO. 
* 4: COLI. 
e 5: РОСС. 


6-7: not used. 


8-15: ТХСО-ТХС7. 


The passthrough slot 


The passthrough slot is used by commands that upload data into XF (other than VAB commands) to assemble the 
full 128-bit value from 32-bit or 64-bit pieces. All write commands of the relevant types write their payload to the 
corresponding 32-bit component (or component pair) of the passthrough slot, then (on the final component, or for 
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some commands, on any component) send the value of the whole passthrough slot downstream. This includes the 
following commands: 


NOP (though the written data is ignored in this case) 
* SYNC (data is likewise ignored) 
XFPR 


* PARAM (merely gathers the components, does not send them anywhere) 

* RUN (doesn't write the slot, merely reads the value left by the PARAM command) 
* PASSTHRU 

* XTRA 

* MODE 

* XFCTX 

* LTC* 


Todo: How are things assembled on Curie? 


RDI access 


On Kelvin and Rankine, VAB can be accessed through RDI as space 0x15. This space is made of 128-bit little-endian 
quaadwords. When writing, a complete 128-bit quadword must be written at once, or data will be damaged. Note that 
the 32-bit words inside quadwords are effectively in reverse order wrt IDX2XF commands (since IDX2XF transfers 
the high word as word 0). In other words: 


* bits 0-31 (RDI address 0х0 modulo 0x10): W component, IDX2XF word 3 
* bits 32-63 (RDI address 0x4 modulo 0x10): Z component, IDX2XF word 2 
* bits 64-95 (RDI address 0x8 modulo 0x10): Y component, IDX2XF word 1 
* bits 96-127 (RDI address Oxc modulo 0x10): X component, IDX2XF word 0 


VAB command 


The VAB command (type Ox1) can be sent by IDX to write or read VAB slots. To simplify writing attributes shorter 
than 4 components, the write command has some special behavior. 


On Celsius, the write command works like this: 
1. If component X or Y of slots 0, 1, 3, or 4 (OPOS, COLO, TXC*) is being written: 
1. On МУ15 and up, set component Y to 0. 
2. Set component Z to 0. 
3. Set component W to 0x3f800000 (1.0f). 
2. Set the selected component(s) of the selected slot to the submitted value(s). 
On Kelvin and up, the write command works like this: 
1. If component X of any slot other than the passthrough one is being written: 


1. Set component Y to 0. 
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2. Set component Z to 0. 
3. Set component W to 0x3f800000 (1.0f). 


2. Set the selected component(s) of the selected slot to the submitted value(s). 


XF context RAMs 


Contents 


* XF context RAMs 
- ХЕСТХ 
- LTCTX 
- LTC 


— Context setting methods 


XFCTX 


Todo: intro? 


NV10 | МУ20 | NV30 | Name 

0x08+ | 0х00+ | Ох3с- | MATRIX, PROJ 

- 0х04+ | 0х40- | MATRIX UNK440 
0x00+ | 0х08- | 0x44+ | MATRIX, MVO 
0х04+ | OxOc+ | 0х48+ | MATRIX IMVO 
OxOc+ | 0x10+ | Ox4c+ | MATRIX MVI 
0х10- | 0x14+ | 0х50- | MATRIX ІМУІ 

- 0х18+ | 0х54+ | MATRIX MV2 

- 0х1с- | 0х58- | MATRIX. IMV2 

- 0х20+ | Ох5с+ | MATRIX МУЗ 

- 0х24+ | 0х60- | MATRIX. IMV3 

0x24 0x28 0x64 LIGHT_0_POSITION 
0x25 0x29 0x65 LIGHT_1_POSITION 
0x26 0х2а 0х66 LIGHT. 2 POSITION 
0x27 Ox2b 0x67 LIGHT 3 POSITION 
0x28 Ox2c 0x68 LIGHT 4 POSITION 
0x29 Ox2d 0x69 LIGHT. 5 POSITION 
Ох2а Ox2e Ox6a LIGHT_6_POSITION 
Ox2b Ox2f Ox6b LIGHT. 7 POSITION 
Ox2c 0x30 0хбс 11СНТ 0 SPOT. DIRECTION 
0х24 0х31 0х64 LIGHT 1 SPOT DIRECTION 
Ox2e 0x32 0хбе LIGHT 2 SPOT DIRECTION 
Ox2f 0x33 Ox6f LIGHT 3 SPOT. DIRECTION 
0x30 0x34 0x70 LIGHT 4 SPOT DIRECTION 
0x31 0x35 0x71 LIGHT_5_SPOT_DIRECTION 


Continued on next page 
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Table 13 — continued from previous page 


NV10 | МУ20 | NV30 | Name 

0x32 0x36 0x72 LIGHT. 6 SPOT. DIRECTION 
0x33 0x37 0x73 LIGHT. 7 SPOT DIRECTION 
0x34 0x38 0x74 LIGHT_EYE_POSITION 
0x35 - - CONST_REFLECT_TWO 
0x36 - - CONST_SPHERE_Z_ONE 
0x37 - - CONST_SPHERE_XY_HALF 
0x38 0x39 0x75 FOG_PLANE 

- Ox3a 0x76 VIEWPORT_SCALE 

0x39 Ox3b 0х77 VIEWPORT TRANSLATE 
0x3a - - CONST WEIGHT ONE 

- Ox3c 0x78 KELVIN_UNK16E0 

- 0х34 0х79 KELVIN. UNK16F0 

- Ox3e Ox7a KELVIN_UNK1700 

- Ox3f Ox7b KELVIN UNKI16DO 

Ox14 0x40 Ox7c TEX 0 GEN S 

Ox15 Ox41 Ox7d TEX 0 GEN T 

0х16 0х42 0х7е ТЕХ 0 GEN R 

0х17 0х43 Ox7f TEX 0 GEN Q 

0х18+ | 0х44+ | 0х80+ | MATRIX TXO 

0х1с 0х48 0х84 ТЕХ 1 GEN S 

0х14 0х49 0х85 ТЕХ 1 GEN T 

0х1е 0х4а 0х86 ТЕХ 1 GEN R 

Ox1f Ox4b 0x87 TEX 1 GEN Q 

0х20+ | Ox4c+ | Ox88+ | MATRIX. TXI 

- 0x50 0х8с ТЕХ 2 СЕМ 5 

- 0х51 0х84 ТЕХ 2 ОЕХ Т 

- 0х52 Ox8e TEX 2 GEN R 

- 0x53 Ox8f ТЕХ 2 GEN Q 

- 0х54- | 0х90- | MATRIX TX2 

- 0x58 0x94 TEX 3 GEN S 

- 0x59 0x95 TEX_3_GEN_T 

- Ох5а 0х96 ТЕХ 3 GEN В 

- Ox5b 0x97 TEX_3_GEN_Q 

- Ох5с+ | 0x98+ | MATRIX_TX3 

- 0x60+ | 0х9с+ | USER 

- - 0x00 TEX 4 GEN S 

- - 0х01 ТЕХ 4 ОЕХ Т 

- - 0x02 TEX 4 GEN R 

- - 0x03 TEX 4 GEN Q 

- - 0х04- | MATRIX ТХ4 

- - 0x08 TEX 5 GEN S 

- - 0x09 TEX 5 GEN T 

- - 0ОхОа ТЕХ 5 GEN R 

- - OxOb TEX 5 GEN Q 

- - 0х0с- | MATRIX ТХ5 

- - Ox10 TEX 6 GEN S 

- - 0х11 ТЕХ 6 ОЕХ Т 

- - 0х12 ТЕХ 6 GEN R 

- - Ox13 TEX 6 GEN Q 


Continued on next page 
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Table 13 — continued from previous page 


NV10 | МУ20 | NV30 | Name 
- - 0х14+ | MATRIX ТХ6 
- - 0х18 ТЕХ 7 СЕМ 5 
- - 0х19 ТЕХ 7 GEN Т 
- - Oxla TEX_7_GEN_R 
- - 0х10 TEX 7 GEN Q 
- - 0х1с- | MATRIX TX7 
- - 0x20 USER CLIP PLANE 0 
- - 0x21 USER CLIP PLANE 1 
- - 0x22 USER CLIP PLANE 2 
- - 0x23 USER CLIP PLANE 3 
- - 0x24 USER_CLIP_PLANE_4 
- - 0x25 USER_CLIP_PLANE_5 
- - 0x26 POINT_PARAMS_A 
- - 0x27 (х: POINT PARAMS B[0], y: POINT PARAMS С, 2: POINT_PARAMS_D} 
- - 0x28 LIGHT. 0 DIRECTION 
- - 0x29 LIGHT. 1. DIRECTION 
- - Ox2a LIGHT_2_DIRECTION 
- - Ox2b LIGHT. 3 DIRECTION 
- - 0х2с LIGHT. 4 DIRECTION 
- - Ox2d LIGHT. 5 DIRECTION 
- - 0х2е LIGHT. 6 DIRECTION 
- - Ox2f LIGHT. 7 DIRECTION 
- - 0x30 LIGHT. 0 HALF VECTOR, ATTENUATION 
- - 0x31 LIGHT. 1 HALF. VECTOR, ATTENUATION 
- - 0x32 LIGHT. 2 HALF. VECTOR, ATTENUATION 
- - 0x33 LIGHT. 3 HALF VECTOR, ATTENUATION 
- - 0x34 LIGHT. 4 HALF VECTOR, ATTENUATION 
- - 0x35 LIGHT. 5 HALF. VECTOR, ATTENUATION 
- - 0x36 LIGHT. 6 HALF. VECTOR, ATTENUATION 
- - 0x37 LIGHT. 7 HALF. VECTOR, ATTENUATION 
- - 0x38 LT UNKI7EO 
- - 0x39 1227 
- - 0x3a 22? 
- - Ox3b ??? 
Ox3b - - [unused] 
LTCTX 
Todo: intro? 
NV10 | NV20 | NV30 | Name 
0x00 0x00 0x00 LIGHT 0 AMBIENT COLOR 
0х01 0х01 0х01 LIGHT 0 ІІЕЕС5Е COLOR 
0х02 0х02 0х02 LIGHT 0 SPECULAR, COLOR 
0x03 0x03 - LIGHT 0 HALF. VECTOR, ATTENUATION 
0x04 0x04 - LIGHT 0. DIRECTION 
Continued on next page 
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Table 14 — continued from previous page 
NV10 | NV20 | NV30 | Name 
- 0x05 0x03 LIGHT 0 BACK. AMBIENT COLOR 
- 0x06 0x04 LIGHT 0 BACK DIFFUSE COLOR 
- 0x07 0x05 LIGHT 0 BACK. SPECULAR COLOR 
0x05 0x08 0x06 LIGHT_1_AMBIENT_COLOR 
0x06 0x09 0x07 LIGHT_1_DIFFUSE_COLOR 
0x07 Ox0a 0x08 LIGHT 1, SPECULAR, COLOR 
0x08 OxOb - LIGHT 1 HALF VECTOR, ATTENUATION 
0x09 OxOc - LIGHT 1 DIRECTION 
- 0х04 0х09 LIGHT 1 BACK. AMBIENT COLOR 
- Ox0e Ox0a LIGHT_1_BACK_DIFFUSE_COLOR 
- OxOf Ox0b LIGHT_1_BACK_SPECULAR_COLOR 
Ox0a 0x10 Ox0c LIGHT_2_AMBIENT_COLOR 
OxOb Ox11 0х04 LIGHT 2 DIFFUSE COLOR 
OxOc 0x12 Ox0e LIGHT_2_SPECULAR_COLOR 
0х04 0х 13 - LIGHT 2 HALF VECTOR, ATTENUATION 
0хОе 0х14 - LIGHT 2 DIRECTION 
- 0x15 OxOf LIGHT 2 BACK. AMBIENT COLOR 
- 0x16 0х10 LIGHT 2 BACK DIFFUSE COLOR 
- 0x17 0x11 LIGHT_2_BACK_SPECULAR_COLOR 
OxOf 0x18 0х12 LIGHT 3 AMBIENT COLOR 
0x10 0x19 0x13 LIGHT_3_DIFFUSE_COLOR 
Ox11 Oxla 0x14 LIGHT_3_SPECULAR_COLOR 
0x12 Ox1b - LIGHT 3 HALF VECTOR, ATTENUATION 
0x13 0х1с - LIGHT 3 DIRECTION 
- Ox1d 0х 15 LIGHT 3 BACK. AMBIENT COLOR 
- Oxle 0x16 LIGHT_3_BACK_DIFFUSE_COLOR 
- Ox1f Ox17 LIGHT 3 BACK SPECULAR COLOR 
0x14 0x20 0x18 LIGHT 4 AMBIENT COLOR 
0х 15 0х21 0x19 LIGHT_4_DIFFUSE_COLOR 
0x16 0x22 Oxla LIGHT_4_SPECULAR_COLOR 
0х17 0х23 - LIGHT 4 HALF VECTOR, ATTENUATION 
0х 18 0х24 - LIGHT 4 DIRECTION 
- 0x25 Ox1b LIGHT 4 BACK. AMBIENT COLOR 
- 0x26 Oxlc LIGHT_4_BACK_DIFFUSE_COLOR 
- 0x27 0х 14 LIGHT 4 BACK. SPECULAR СОГОК 
0x19 0x28 Oxle LIGHT_5_AMBIENT_COLOR 
Oxla 0x29 Ox1f LIGHT 5. DIFFUSE COLOR 
Ox1b 0х2а 0х20 LIGHT 5 SPECULAR, COLOR 
Өхіс Ox2b - LIGHT 5 HALF VECTOR, ATTENUATION 
0х 14 0х2с - LIGHT 5 DIRECTION 
- 0х24 0х21 LIGHT 5 BACK. AMBIENT COLOR 
- Ox2e 0x22 LIGHT 5. BACK DIFFUSE COLOR 
- Ox2f 0x23 LIGHT 5 BACK. SPECULAR COLOR 
0х їе 0х30 0х24 LIGHT 6 AMBIENT COLOR 
Ox1f 0x31 0x25 LIGHT 6 DIFFUSE COLOR 
0x20 0x32 0x26 LIGHT 6 SPECULAR, COLOR 
0x21 0x33 - LIGHT_6_HALF_VECTOR_ATTENUATION 
0x22 0x34 - LIGHT_6_DIRECTION 
- 0x35 0x27 LIGHT_6_BACK_AMBIENT_COLOR 
Continued on next page 
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Table 14 — continued from previous page 
NV10 | NV20 | NV30 | Name 
- 0x36 0x28 LIGHT 6 BACK DIFFUSE COLOR 
- 0x37 0x29 LIGHT 6 BACK. SPECULAR, COLOR 
0x23 0x38 Ох2а LIGHT 7 AMBIENT COLOR 
0x24 0x39 0x2b LIGHT_7_DIFFUSE_COLOR 
0x25 0x3a 0x2c LIGHT 7 SPECULAR, COLOR 
0x26 Ox3b - LIGHT 7 HALF VECTOR, ATTENUATION 
0x27 0х3с - LIGHT 7 DIRECTION 
- 0х34 0х24 LIGHT 7 BACK. AMBIENT COLOR 
- Ox3e Ох2е LIGHT 7 BACK DIFFUSE COLOR 
- Ox3f Ox2f LIGHT 7 BACK. SPECULAR, COLOR 
0x28 - - 22? 
- 0х40 - LT UNKI7EO 
0x29 0х41 0х30 LIGHT MODEL AMBIENT COLOR 
- 0x42 0x31 LIGHT MODEL BACK, AMBIENT COLOR 
Ox2a 0x43 0x32 MATERIAL_FACTOR_RGB 
- 0x44 0x33 MATERIAL_FACTOR_BACK_RGB 
0x2b 0x45 - FOG_COEFF 
0x2c - - CONST_ZERO 
- 0x46 0x34 LT_UNK17D4 
0x2d 0x47 - 


POINT_PARAMS_A 
0х2е 0x48 - POINT PARAMS B 
Ox2f - - [unused] 
- 0x49 - LT UNKI7EC 


т = 0х35 22? 
- : 0х36 VIEWPORT TRANSLATE 
- - 0x37 | VIEWPORT SCALE 


LTC 


Todo: intro? 


NV10 | МУ20 | NV30 | Name 


0.0x00 | - - [const 1.0] 

0.0x01 | - - CONST 227 

- 0.0x00 | - 22? 

- 0.0x01 | - 22? 

0.0x02 | 0.0x02 | - MATERIAL, SHININESS D 

- 0.0x03 | - MATERIAL BACK, SHININESS D 
1.0x00 | - - [const 0.0] 


- 1.0x00 | 1.0x00 | ??? 

1.0x01 | 1.0x01 | 1.0х01 | MATERIAL SHININESS A 

- 1.0x02 | 1.0x02 | MATERIAL BACK SHININESS А 

- 1.0x03 | MATERIAL SHININESS D 

- - 1.0x04 | MATERIAL BACK, SHININESS D 

1.0x02 | 1.0x03 | - POINT PARAMS C 

1.0x03 | 1.0x04 | 1.0x05 | LIGHT 0 LOCAL RANGE 
Continued on next page 
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Table 15 — continued from previous page 
NV10 | МУ20 | NV30 | Name 
1.0х04 | 1.0x05 | 1.0x06 | LIGHT 1 LOCAL RANGE 
1.0x05 | 1.0x06 | 1.0x07 | LIGHT 2 LOCAL RANGE 
1.0х06 | 1.0х07 | 1.0х08 | LIGHT 3 LOCAL RANGE 
1.0x07 | 1.0x08 | 1.0x09 | LIGHT 4 LOCAL RANGE 
1.0х08 | 1.0x09 | 1.0x0a | LIGHT 5 LOCAL RANGE 
1.0х09 | 1.0x0a | 1.0x0b | LIGHT 6 LOCAL RANGE 
1.0x0a | 1.0x0b | 1.0х0с | LIGHT 7 LOCAL RANGE 
1.0x0b | 1.0x0c | 1.0x0d | LIGHT 0 SPOT CUTOFF 0 
1.0х0с | 1.0х04 | 1.0x0e | LIGHT 1 SPOT CUTOFF 0 
1.0x0d | 1.0x0e | 1.0x0f | LIGHT 2 SPOT CUTOFF 0 
1.0x0e | 1.0х0Ғ | 1.0x10 | LIGHT 3 SPOT CUTOFF 0 
1.0x0f | 1.0x10 | 1.0x11 | LIGHT 4 SPOT CUTOFF 0 
1.0x10 | 1.0х11 | 1.0x12 | LIGHT 5 SPOT CUTOFF 0 
1.0x11 | 1.0х12 | 1.0x13 | LIGHT 6 SPOT CUTOFF 0 
1.0x12 | 1.0x13 | 1.0x14 | LIGHT 7 SPOT CUTOFF 0 
2.0x00 | - - [const 1.0] 
- 2.0х00 | 2.0x00 | ??? 
2.0х01 | 2.0x01 | 2.0х01 | MATERIAL SHININESS B 
- 2.0x02 | 2.0x02 | MATERIAL BACK SHININESS B 
2.0x02 | 2.0x03 | 2.0x03 | MATERIAL SHININESS E 
- 2.0x04 | 2.0х04 | MATERIAL BACK SHININESS E 
2.0х03 | 2.0x05 | - MATERIAL, SHININESS F 
- 2.0x06 | - MATERIAL BACK, SHININESS Е 
2.0x04 | 2.0x07 | 2.0x05 | LIGHT 0 SPOT CUTOFF 1 
2.0х05 | 2.0х08 | 2.0x06 | LIGHT 1 SPOT CUTOFF 1 
2.0x06 | 2.0x09 | 2.0x07 | LIGHT 2 SPOT CUTOFF 1 
2.0x07 | 2.0x0a | 2.0x08 | LIGHT 3 SPOT CUTOFF 1 
2.0x08 | 2.0х0р | 2.0x09 | LIGHT 4 SPOT СОТОРЕ 1 
2.0x09 | 2.0х0с | 2.0x0a | LIGHT 5 SPOT CUTOFF 1 
2.0x0a | 2.0x0d | 2.0x0b | LIGHT 6 SPOT CUTOFF 1 
2.0x0b | 2.0x0e | 2.0x0c | LIGHT 7 SPOT CUTOFF 1 
3.0x00 | - - [const 0.0] 
E 3.0x00 | 3.0x00 | ??? 
3.0x01 | 3.0x01 | - POINT PARAMS D 
3.0x02 | 3.0x02 | 3.0х01 | MATERIAL SHININESS C 
- 3.0x03 | 3.0х02 | MATERIAL BACK. SHININESS C 
- - 3.0х03 | MATERIAL SHININESS Е 
- - 3.0x04 | MATERIAL BACK, SHININESS F 
3.0x03 | 3.0x04 | 3.0x05 | LIGHT 0 SPOT CUTOFF 2 
3.0x04 | 3.0x05 | 3.0x06 | LIGHT 1. SPOT CUTOFF 2 
3.0x05 | 3.0x06 | 3.0x07 | LIGHT 2 SPOT CUTOFF 2 
3.0x06 | 3.0х07 | 3.0х08 | LIGHT 3 SPOT CUTOFF 2 
3.0x07 | 3.0x08 | 3.0x09 | LIGHT 4 SPOT CUTOFF 2 
3.0x08 | 3.0x09 | 3.0x0a | LIGHT 5. SPOT CUTOFF 2 
3.0x09 | 3.0x0a | 3.0x0b | LIGHT 6 SPOT CUTOFF 2 
3.0x0a | 3.0x0b | 3.0x0c | LIGHT 7 SPOT CUTOFF 2 
3.0x0b | 3.0x0c | 3.0х04 | MATERIAL FACTOR A 
- 3.0x0d | 3.0x0e | MATERIAL FACTOR BACK А 


2.9. PGRAPH: 2d/3d graphics and compute engine 301 


nVidia Hardware Documentation, Release git 


Context setting methods 


Todo: write me 


XF mode selection 


Contents 


* XF mode selection 
- Introduction 
— XFMODE - Celsius 
- XFMODE - Kelvin & Rankine 
- Curie ХЕ bundles 


— Mode setting methods 


Introduction 


This document describes the mode bits controlling ХЕ behavior. On NV10:NV40, such mode bits are gathered in 
a 128-bit vector (or two vectors on Rankine) called XFMODE. XFMODE is loaded to XF via the IDX2XF MODE 
command. FE3D keeps a MMIO-exposed shadow copy of the XFMODE vector(s), updating it as mode-affecting 
methods are processed, and sending a copy to XF every time it changes. The shadow copy is also used for context 
switching. Due to the word endianness mismatch between FE shadow copy / IDX2XF addresses and XF internal 
commands, keeping track of the word positions can be rather confusing. 


On NV40:, XFMODE no longer exists, and XF mode is instead controlled by state bundles like most other parts of the 
pipeline. 


XFMODE - Celsius 


On Celsius, XFMODE is a single 128-bit vector, with the following fields: 


• bits 0-31: XFMODE А, the low word: 


— bit 0: TEX 0 ENABLE - if set, coordinates for texture 0 will be computed. Otherwise, texture unit 0 will 
be ignored. 


- bit 1: TEX 0 MATRIX ENABLE - if set, enabled transformation of texture 0 coordinates by texture 
matrix. This must be set if texgen is used, or if perspective is disabled. 


— bit 2: ТЕХ 0 PERSPECTIVE - if set, the final texture 0 coordinates will be multiplied by the final 1/w. 


bits 3-5: TEX 0 GEN 5 - selects how texture 0 coordinate s is generated. 
— bits 6-8: TEX 0 GEN T 

bits 9-11: TEX 0 GEN R 

bits 12-13: ТЕХ 0 GEN О 
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bit 14: TEX 1 ENABLE 

bit 15: TEX 1 MATRIX ENABLE 
— bit 16: TEX 1 PERSPECTIVE 

— bits 17-19: ТЕХ 1 СЕМ S 

bits 20-22: TEX 1 GEN T 

— bits 23-25: TEX 1 GEN R 

bits 26-27: ТЕХ 1 СЕМ О 

bit 28: LIGHT MODEL LOCAL VIEWER 
— bit 29: LIGHTING ENABLE 

- bit 30: NORMALIZE ENABLE 

— bit 31: FOG ENABLE 


bits 32-63: XFMODE В, the high word: 
— bits 0-1: LIGHT MODE 0- Selects how light 0 behaves. One of: 


ж 0: NONE - light is disabled. Note that if a light is disabled, all subsequent lights must be disabled as 


well. 
* 1: INFINITE 
* 2: LOCAL 
* 3: SPOTLIGHT 
bits 2-3: LIGHT МОРЕ 1 - Likewise for light 1. 
bits 4-5: LIGHT MODE 2 
— bits 6-7: LIGHT MODE 3 
bits 8-9: LIGHT MODE 4 
bits 10-11: LIGHT MODE 5 
— bits 12-13: LIGHT MODE 6 
bits 14-15: LIGHT MODE 7 
bits 16-17: FOG_COORD - Selects how fog coordinate is computed. One of: 
* 0: PASS 
* 1: DIST RADIAL 
* 2: DIST ORTHOGONAL 
* 3: DIST ORTHOGONAL ABS 
bit 18: LIGHT MODEL UNK? - ??? 
bit 19: LIGHT MODEL VERTEX SPECULAR - ??? 
— bit 20: LIGHT MODEL SEPARATE SPECULAR - ??? 
bits 21-24: LIGHT MATERIAL - ??? 


18 used. 


bit 25: POINT PARAMS ENABLE - if set, XF&LT compute point size. Otherwise, constant point size 
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— bit 27: WEIGHT ENABLE - if set, eye space transformation matrices will be blended together using the 


input weight. 
— bit 28: BYPASS - if 


set, XF&LT are in bypass mode, and only a small set of computations will be per- 


formed. Otherwise, full transform and lighting is enabled. 


— bit 29: ORIGIN - selects viewport offset used in bypass mode. One of: 


* 0: CORNER 
х 1: CENTER 
* bits 64-127: unused. 


Where tex gen modes can be one of: 


* 0: PASS - input coordinate is passed through. 


* 1: EYE LINEAR 
: OBJECT LINEAR 


: NORMAL MAP (only 


. 
л & ошо N 


: SPHERE MAP (only supported on s and t) 


supported on s, t, r) 


: REFLECTION MAP (only supported on s, t, r) 


* 6: EMBOSS MAP (only supported on s of texture 1, but if used affects all coordinates) 


The FE3D shadow copies are kept at: 


* MMIO 0x400f40: XFMODI 
* MMIO 0x400f44: XFMODI 


E B 


E_A (writing this register causes the MODE command to be submitted to ХЕ). 


XFMODE - Kelvin & Rankine 


On Kelvin, XFMODE consists of a single 128-bit vector: 


* bits 0-31 aka word 3: XFM 


* bits 32-63 aka word 2: XFMODE 
* bits 64-95 aka word 1: XFMODI 


ODE T[0] (textures О and 1) 


[1] (textures 2 and 3) 


[т] 


T 
_А 


* bits 96-127 aka word 0: XFMODE B 


On Rankine, XFMODE consists 
* vector 0: 


— bits 0-31 aka word 3: 


— bits 32-63 aka word 2: XFMODE 


— bits 64-95 aka word 1: XFMODE 


— bits 96-127 aka word 
* vector 1: 


— bits 0-31 aka word 3: 


of two 128-bit vectors: 


XFMODE A 


| B 
„С 
0: unused 


XFMODE T[0] (texture coordinates O and 1) 


— bits 32-63 aka word 2: XFMODE T[1] (texture coordinates 2 and 3) 


— bits 64-95 aka word 1: XFMODE T [2] (texture coordinates 4 and 5) 
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bits 96-127 aka word 0: XFMODE T [3] (texture coordinates б and 7) 


The words are as follows: 


XFMODI 


XFMOD! 


E А: 


bits 0-1: LIGHT MATERIAL SPECULAR BACK - one of: 
0: NONE 

1: COLO 

2: COLI 

bits 2-3: LIGHT MATERIAL  DIFFUSE BACK 

bits 4-5: LIGHT MATERIAL AMBIENT BACK 

bits 6-7: LIGHT MATERIAL EMISSION BACK 


bits 8-15: PROGRAM. START POS - index of the first program to be executed in PROGRAM * 
modes. 


bit 16: SPECULAR ENABLE - ??? 

bit 17: ???, Kelvin LIGHT MODEL bit 17 

bit 18: LIGHT MODEL SEPARATE SPECULAR - ??? 
bits 19-20: LIGHT MATERIAL SPECULAR FRONT 
bits 21-22: LIGHT MATERIAL DIFFUSE FRONT 
bits 23-24: LIGHT MATERIAL AMBIENT FRONT 
bits 25-26: LIGHT MATERIAL EMISSION FRONT 
bit 27: NORMALIZE ENABLE 

bit 28: LIGHT MODEL UNK? - ??? 

bit 29: LIGHT TWO SIDE ENABLE 

bit 30: LIGHT MODEL LOCAL VIEWER 

bit 31: LIGHTING ENABLE 


E В: 


* bits 0-1: LIGHT MODE 0- Selects how light 0 behaves. One of: 


0: NONE - light is disabled. Note that if a light is disabled, all subsequent lights must be disabled as well. 
1: INFINITE 

2: LOCAL 

3: SPOTLIGHT 


bits 2-3: LIGHT МОРЕ 1 - Likewise for light 1. 
bits 4-5: LIGHT MODE 2 

bits 6-7: LIGHT MODE 3 

bits 8-9: LIGHT MODE 4 

bits 10-11: LIGHT MODE 5 

bits 12-13: LIGHT MODE 6 


2.9. PGRAPH: 2d/3d graphics and compute engine 305 


nVidia Hardware Documentation, Release git 


bits 14-15: LIGHT MODE 7 


bit 16: VIEWPORT TRANSFORM SKIP [NV30:] – if set, the position output from vertex program is assumed 
to already be in screen coordinates, and no viewport transform will be performed. Otherwise, it is assumed to 
be in clip coordinates and will be transformed by fixed-function viewport transform. 


bit 17: АКІТН RULES [NV30:] — selects how various arithmetic operations behave. 


- 0: LEGACY - semantics as in GL. NV vertex program, with various idiosyncracies (0 times NaN is 0, 
-NaN « -Inf « -0 « 0 « Inf « NaN, etc). 


- 1: MODERN - semantics as in GL. NV vertex program2, mostly following IEEE 754. 


bit 18: ХЕСТХ ACCESS - determines which XFCTX entries are accessible to the running programs: 


- 0: USER ONLY - only USER will be accessible by indirect accesses; only USER, VIEW- 
PORT TRANSLATE, and VIEWPORT SCALE will be accessible by direct accesses. 


— 1: FULL - all XFCTX entries are accessible. 


bit 19: FOG ENABLE - if set, XF&LT computes the fog coord. Otherwise, fog computations are not performed. 
bit 20: 222, set by UNK9CC method. 


bit 21: РОС MODE EXP [NV20:NV30] - if set, one of the EXP fog modes is used. Otherwise, one of LINEAR 
modes is used. 


bits 22-24: РОС COORD [NV20:NV30] - selects how fog coordinate is computed. One of: 
— 0: SPEC ALPHA 
- 1: DIST RADIAL 
2: DIST ORTHOGONAL 
- 3: DIST ORTHOGONAL ABS 
- 4: КОС COORD 
• bits 22-23: FOG. COORD [NV30:] - selects how fog coordinate is computed. One of: 
— 0: SPEC ALPHA 
- 1: DIST RADIAL 
2: DIST ORTHOGONAL 
- 3: FOG_COORD 
* bit25: POINT PARAMS ENABLE - if set, XF&LT compute point size. Otherwise, constant point size is used. 


* bits 26-28: WEIGHT MODE - selects how weighting works. One of: 
- 0: NONE 
- 1 

22? 

22? 


2: 
3: 

- 4: ??? 
DUI 
6: 


922 


* bit 29: ХЕСТХ WRITE ENABLE - if set, vertex programs are allowed to write to ХЕСТХ, but will execute 
serially. If clear, writes are blocked, but vertices can be processed in parallel. 
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* bits 30-31: 


MODE ~ selects operating mode, one of: 


— 0: FIXED - full fixed-function transform and lighting 


1: BYPASS [NV20:NV30] – minimal computations performed 


- 1: PROGRAM УЗ [NV40:] — vertex program is run, fixed-function computations disabled, third- 


gener 


ation ISA features are supported. 


features are supported. 


— 3: PROGRAM V2 [NV30:] - like above, but second-generation ISA features are supported. 


XFMODE_C (only on Rankine): 
* bits 0-5: CLIP PLANE ENABLE [0-5] 


2: PROGRAM VI — vertex program is run, fixed-function computations disabled, first-generation ISA 


XFMODE_T (two instances on Kelvin, four on Rankine - each describes two textures): 


will be ignored. 


matrix. 


ignored. 


bits 10-12: 
bits 13-15: 


bits 20-22: 
bits 23-25: 
bits 26-28: 
bits 29-31: 


TEX 0 GEN R 
ТЕХ 0 GEN О 


bit 16: TEX 1 ENABLE 
bit 17: TEX 1 MATRIX ENABLE 
bit 18: TEX 1 R ENABLE 


ТЕХ 1 СЕМ S 
TEX 1 GEN T 
TEX 1 GEN R 
TEX 1 GEN Q 


The supported texgen mode are the same as on Celsius. 


On Kelvin, the Е) 


E3D shadow copies are kept at: 


* MMIO 0x400fb4: XFMODE B 
* MMIO 0x400fb8: XFMODE A 
* MMIO 0x400fbc: XFMODE_T [1] 


* MMIO 0x400fc0: XFMODE T[0] 


And on Rankine: 


* MMIO 0x400fb4: (dummy 0 word) 
* MMIO 0x400fb8: XFMODE C 
* MMIO 0x400fbc: XFMODE B 


bit 0: ТЕХ 0 ENABLE - if set, coordinates for texture 0/2/4/6 will be computed. Otherwise, texture unit 0/2/4/6 


bit 1: TEX 0 MATRIX ENABLE - if set, enabled transformation of texture 0/2/4/6 coordinates by texture 


bit 2: ТЕХ 0 К ENABLE - if set, the г coordinate for texture 0/2/4/6 will be computed. Otherwise, it will be 


bits 4-6: ТЕХ 0 GEN 5 - selects how texture 0/2/4/6 coordinate s is generated. 
bits 7-9: TEX 0 GEN T 
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MMIO 0x400fc0: XFMO 

MMIO 0x400fc4: XFMO 

* MMIO 0x400fc8: XFMODE T[2] 
M O 
M 


MIO 0x400fcc: ХЕМ 
MIO 0x400fd0: XFMODE T[0] 


Curie XF bundles 


XF A: 

* bit 0: 222, set by UNK9CC method [NV40:NV41] 

* bit 2: ХЕСТХ ACCESS [NV40:NV41] 
bits 3-4: LIGHT MATERIAL EMISSION FRONT [NV40:NV41] 
bits 5-6: LIGHT MATERIAL AMBIENT FRONT [NV40:NV41] 
bits 7-8: LIGHT MATERIAL DIFFUSE FRONT [NV40:NV41] 
bits 9-10: LIGHT MATERIAL SPECULAR FRONT [NV40:NV41] 
bits 11-12: LIGHT MATERIAL EMISSION BACK [NV40:NV41] 
bits 13-14: LIGHT MATERIAL AMBIENT BACK [NV40:NV41] 
bits 15-16: LIGHT MATERIAL DIFFUSE BACK [NV40:NV41] 
bits 17-18: LIGHT MATERIAL SPECULAR BACK [NV40:NV41] 
bits 19-21: РОС COORD [NV40:NVA1] 
bit 22: LIGHTING ENABLE [NV40:NV41] 
bits 23-25: WEIGHT MODE [NV40:NV41] 
bit 26: NORMALIZE ENABLE [NV40:NV41] 
bit 28: VIEWPORT_TRANSFORM_SKIP 
XF_LIGHT [NV40:NV41]: 

* bits 0-1: LIGHT MODE 0 

* bits 2-3: LIGHT MODE 1 
bits 4-5: LIGHT MODE 2 
bits 6-7: LIGHT MODE 3 
bits 8-9: LIGHT MODE 4 
bits 10-11: LIGHT MODE 5 
bits 12-13: LIGHT MODE 6 
bits 14-15: LIGHT MODE 7 
bit 16: LIGHT MODEL LOCAL VIEWER 
bit 17: ???, Kelvin LIGHT MODBHL bit 17 
bit 18: LIGHT MODEL SEPARATE SPECULAR - ??? 
XF C: 
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* bits 0-9: PROGRAM START POS 
e bit 27: ARITH RULES 
* bits 30-31: MODE 
XF D: 
* bits 0-15: TIMEOUT 
e bit 16: ??? set by UNKIEFS bit 20 
XF TXC: 
* bits 0-2: ТЕХ GEN. 5 [NV40:NV41] [only present for first 8 coords] 
* bits 4-6: ТЕХ GEN T [NV40:NV41] [only present for first 8 coords] 
e bits 8-10: ТЕХ GEN. К [NV40:NV41] [only present for first 8 coords] 
* bits 12-14: ТЕХ СЕМ Q [NV40:NV41] [only present for first 8 coords] 
e bit 16: TEX MATRIX ENABLE [NV40:NV41] [only present for first 8 coords] 
* bit 17: 272 
e bit 18: 222 


• bit 19: 222 


Todo: Incomplete list. 


Mode setting methods 


Todo: write me 


XF instruction set 


Contents 


* XF instruction set 
— Introduction 
— Program execution environment 
- Instruction encoding and storage 
* RDI access 
— Instruction execution 
* Reading sources 


* Writing outputs 


* Output addresses 
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— Instructions 
— XFPR command 
ж Kelvin -> Rankine ISA conversion 
* Rankine -» combined ISA conversion 


* Curie -» combined ISA conversion 


- Instruction upload methods 


Introduction 


XF uses a VLIW instruction set. Roughly, a single instruction can do all of the following: 
1. Read one IBUF slot. 
2. Read one XFCTX slot. 
3. Read three source operands: 
* each source can be independently selected from: 
— the value read from the IBUF slot 
— the value read from the XFCTX slot 
— an arbitrary temporary register 
* an arbitrary swizzle can be applied to each source component 
* starting with NV30, each source can be optionally replaced with its absolute value 
* each source can be optionally negated 
. Perform one vector operation (using sources #0, #1, and maybe #2) on the ALU+MLU. 
. Perform one scalar operation (using source #2) on ILU or SFU. 
. Perform an optional saturation on the results. 


. Write the results (with masking) to temporary registers. 


о N с tA A 


. Write the results (with masking) to either the output buffers or ХЕСТХ [NV20:NV40]. 
9. Optionally, end vertex processing (and submit results downstream). 
There are 5 instruction sets used by XF: 


1. Celsius ISA: used internally by Celsius GPUs as microcode to perform the fixed-function processing. Not ac- 
cessible in any way from the outside, so the encoding will not be described here, but the computation primitives 
are roughly the same as later ISAs and will be described here. 


2. Kelvin ISA: used natively by Kelvin GPUs to store the instructions in XFPR RAM. Can be uploaded by the user 
through the Kelvin classes. Supported by Rankine GPUs in compatibility mode through dynamic translation to 
Rankine ISA. Corresponds to СІ, МУ vertex program extension. 


3. Rankine ISA: used natively by Rankine GPUs and can be uploaded through Rankine classes. Supported 
by NV40:NV41 in compatibility mode through dynamic translation to the combined ISA. Corresponds to 
GL NV vertex program2 extension. Is a proper superset of the Kelvin ISA. 


4. Curie ISA: used natively by NV41:G80 GPUs and can be uploaded through Curie classes. Sup- 
ported by NV40:NV41 mode through dynamic translation to the combined ISA. Corresponds to 
GL_NV_vertex_program3 extension. Is not a proper superset of the Rankine ISA. 
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5. Combined ISA: used natively by NV40:NV41 GPUs. Cannot be directly uploaded by the user. Is more or less 
a sum of Rankine and Curie ISAs. 


Program execution environment 


The XF can execute the following kinds of programs: 


1. Simple vertex programs. Started when IDX signals that a full vertex has been written to the VAB. The VAB 
contents are copied to the IBUF beforehand, and when the program is done, outputs will be sent to VTX for 
further processing by the graphics pipeline. Multiple vertex programs can be executing in parallel at a given 
moment (up to 3 per VPE). The only effect of a simple vertex program is emitting a transformed vertex. 


2. Vertex programs with side effects [NV20:NV40]. Started just like simple vertex programs (a global mode bit 
determines whether a simple program or a program with side effects is launched), but can write to XFCTX in 
addition to their normal powers, and nothing else can be happening on XF while one is running. 


3. Vertex state programs [NV20:NV40]. Started by the RUN ХЕ command. Their only input is a single vector 
submitted beforehand by the PARAM XF command. They have no output, and their only possible effect is 
updating XFCTX. Nothing else can be happening on XF while a vertex state program is being executed. Once 
the program completes, XF moves on to the next input command, without submitting anything downstream. 


Every program has the following private state while it's executing: 


1. IBUF, the input buffer, read only by the program. On Celsius, is made of 7 vectors. On Kelvin and up, is made 
of 16 vectors. For vertex programs, contains a complete copy of VAB (except the passthrough slot) captured at 
the moment of program start. For vertex state programs, only the first vector is usable, and it contains a copy of 
VAB passthrough slot (which should have been set by XF PARAM command). 


2. XFREG, the temporary register file. Made of 12 vector registers on Kelvin, 16 vector registers on Rankine, ??? 
vector registers on Curie. On Celsius, allegedly made of 8 vector registers, but it's impossible to tell. 


Starting with Kelvin, the register file is cleared to all-0 between executions. However, this clear is done after a 
program execution, and after an XF reset. 


3. AREG [NV20:], the address register file. On Kelvin, this is a single signed 9-bit integer register (or maybe 
larger, it's impossible to tell). On Rankine, contains 2 vector registers, each made of 4 components, where each 
component is a 10-bit signed integer. On Curie, is likewise made of 2 4-component vector registers, where each 
component is a ???-bit signed integer. 


4. CREG [NV30:], the condition register file. On Rankine, this is a single 4-component vector register, where each 
component is a 2-bit condition code. The codes are: 


* U: unordered (result was a NaN) 
* L: less than (result was negative) 
* E: equal (result was a 0) 
* G: greater than (result was positive) 
On Curie, this contains 2 4-component vector registers, with the same structure. 


5. PC: the program counter. Basically, a pointer in XFPR RAM. For vertex programs, initialized from the starting 
PC in XFMODE or XF PROG bundle. For vertex state program, the initial PC is sent as the payload of the 
RUN command. 


6. ICNT [NV30:]: the instruction counter. Counts the number of instructions executed by the program so far. 
Initialized to 0 on program start. When it hits the timeout value, the program is forcibly stopped. 


7. stack [NV30:]: an 8-slot call/return stack. On Curie, can also be used to push and pop address registers. 
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8. TBUF: the main output buffer. Write only by the program, contains data to be sent to VTX once the program is 
done. On Celsius, made of 5 float vectors. On Kelvin and up, made of 16 float vectors. 


9. STPOS [NV20:NV40?]: shadow TBUF position. A single vector register that receives a copy of anything writ- 
ten to TBUF slot 0 and can be read back by the program. Used on Kelvin to implement viewport transformation 
transparently wrt user shaders. 


10. WBUF [NV10:NV30]: one of the LT output buffers. Write only by the program, contains data to be sent to LT 
once the program is done. Made of 17 3-component vectors of 22-bit floats. While it can be written by user 
programs, it is only useful for fixed function processing. 


11. VBUF [NV10:NV30]: the other LT output buffer. Like WBUF, except has 13 entries instead of 17. 


12. ОВОР [NV30:NV40]: the unified LT output buffer. Same purpose as WBUF and VBUF, but is made of 10 
5-component vectors of 22-bit floats. 


Todo: NV34 (and presumably all Kelvins and Rankines) have SIPOS, which is a copy of the first IBUF word with 
unknown purpose. 


In addition, all running programs have access to the following shared resources: 


* mode bits (XFMODE or state bundles): control various aspects of XF operation. 


XFCTX: the context RAM. Contains state used by fixed-function transform, as well as parameters to user- 
defined programs. Can be read by all types of programs, and can be written by vertex programs with side effects 
and by vertex state programs. 


XFPR [NV20:]: the program code RAM. Contains the code of user-defined programs. 


XTRA [NV30:NV41]: ??? contains 2 vectors of 8 9-bit numbers. 


TIMEOUT [NV30:]: a 16-bit number specifying the maximal number of instructions that a single program is 
allowed to execute. On Curie, this is part of the state bundles, but on Rankine it's a standalone piece of state. 


ХЕТЕХ [NV40:]: 4 textures with limitted functionality available for sampling by programs. 


Instruction encoding and storage 


User-submitted instructions are stored in the XFPR RAM, which is: 
* on Kelvin: a global array of 0x88 92-bit words in Kelvin ISA encoding. 
* on Rankine: a global array of 0x118 112-bit words in Rankine ISA encoding. 
* on NV40:NV41: a per-VPE array of 0x220 144-bit words in combined ISA encoding. 
* on NV41:G80: a per-VPE array of 0x220 127-bit words in Curie ISA encoding. 


On NV10:NV41, the XF unit also has instruction ROM with programs for fixed-function processing, but it is not 
accessible in any way. 


The instruction words are encoded as follows: 


Kelvin | Rankine | combined | Curie Field 

0 0 0 0 END 

1 1 1 1 XFCTX_INDEXED 
2 - - - OUT IS SCA 

3-10 2-10 2-6 2-6 OUT ADDR 


Continued on next page 
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Table 16 — continued from previous page 


Kelvin | Rankine | combined | Curie Field 

11 11 - - OUT TARGET 
12-15 - - - OUT WM 

- 12-15 132-135 - OUT WM VEC 
- 16-19 128-131 - OUT WM, SCA 
- - 7-12 7-12 DST SCA 
24-27 20-23 13-16 13-16 DST WM VEC 
20-23 112-116 - - DST 

16-19 24-27 17-20 17-20 DST WM SCA 
- - 111-116 111-116 | DST VEC 
28-42 28-42 21-37 21-37 SRC2 

43-57 | 43-57 38-54 38-54 SRCI 

58-72 | 58-72 55-71 55-71 SRCO 

73-16 73-16 72-15 72-15 IBUF ADDR 

- TT - - 229 

77-84 78-86 76-85 76-85 XFCTX ADDR 
85-88 87-91 86-90 86-90 OP VEC 

89-9] 92-96 91-95 91-95 OP SCA 

- 97-08 96-97 96-97 ASRC SWZ 

- 99-106 98-105 98-105 CSRC SWZ 

- 107-109 106-108 106-108 | COND TEST 

- 110 109 109 COND ENABLE 
- 111 110 110 CDST WM 

- 117 117 117 SRCO ABS 

- 118 118 118 5ЕСІ ABS 

- 119 119 119 SRC2 ABS 

- 120 120 120 ASRC 

- 121 - - unused? 

- - 121 121 CSRCDST 

- - 122 122 SAT 

- - 123 123 IBUF INDEXED 
- - 124 124 OUT INDEXED 
- ? 125 125 CDST 15 VEC 
- - 126 126 OUT IS VEC 

- - 127 - WAS CURIE 


SRC* fields are further subdivided as follows: 


Kelvin | Rankine | combined | Curie | Field 

0-1 0-1 0-1 0-1 SRCx MUX 
2-5 2-5 2-7 2-7 SRCx REG 
6-13 6-13 8-15 8-15 SRCx SWZ 
14 14 16 16 SRCx NEG 


8-bit SWZ fields represent vector swizzles and are made of the following subfields: 
* bits 0-1: W 
* bits 2-3: Z 
* bits 4-5: Y 
* bits 6-7: X 
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RDI access 


Todo: write me 


Instruction execution 


Reading sources 


Todo: write me 


Writing outputs 


Todo: write me 


Output addresses 


Todo: write me 


Instructions 


The vector opcodes are: 

* 0x00: NOP 
0x01: MOV 
0x02: MUL 
0x03: ADD 
0x04: MAD 
0x05: DP3 
0x06: DPH 
0x07: DP4 
0x08: DST [NV20:] 
0x09: MIN [NV20:] 
0x0a: MAX [NV20:] 
OxOb: SLT [NV20:] 
0хОс: SGE [NV20:] 
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0х04: 
0хОе: 


ARL [NV20:] 
FRC [NV30:] 


OxOf: FLR [NV30:] 


0х10: 
0x11: 
0x12: 
0x13: 
0x14: 
0x15: 
0x16: 
0x17: 
0x18: 
0x19: 


SEQ [NV30:] 
SFL [NV30:] 


ARR [NV30:] 
ARA [NV30:] 
TXL [NV40:] 


The scalar opcodes are: 


0x00: 
0x01: 
0x02: 
0x03: 
0x04: 
0x05: 
0x06: 
0x07: 
0x08: 
0x09: 
0x0a: 
OxOb: 
Ох0с: 
0х04: 
Охбе: 


МОР 

МОУ 

КСР 

КСС 

RSQ 

EXP [NV20:] 
LOG [NV20:] 
LIT [NV20:] 
??? [NV30:] 
BRA [NV30:] 
??? [NV30:] 
CAL [NV30:] 
RET [NV30:] 
LG2 [NV30:] 
EX2 [NV30:] 


OxOf: SIN [NV30:] 


Ox10: 
0х11: 
0х12: 
0х 13: 
0х 14: 


COS [NV30:] 
??? [NV40:] 
999 [NV40:] 


PUSHA [NV40:] 


POPA [NV40:] 
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Todo: write me 


XFPR command 


Todo: write me 


Kelvin -> Rankine ISA conversion 


Todo: write me 


Rankine -> combined ISA conversion 


Todo: write me 


Curie -> combined ISA conversion 


Todo: write me 


Instruction upload methods 


Todo: write me 


2.10 falcon microprocessor 


Contents: 


2.10.1 Introduction 


falcon is a class of general-purpose microprocessor units, used in multiple instances on nvidia GPUs starting from 
G98. Originally developed as the controlling logic for VP3 video decoding engines as a replacement for xtensa used 
on VP2, it was later used in many other places, whenever a microprocessor of some sort was needed. 


A single falcon unit is made of: 


* the core microprocessor with its code and data SRAM [see Processor control] 
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an IO space containing control registers of all subunits, accessible from the host as well as from the code running 
on the falcon microprocessor [see /O space] 


common support logic: 


interrupt controller [see /nterrupt delivery] 


— periodic and watchdog timers [see Timers] 


scratch registers for communication with host [see Scratch registers] 
— PCOUNTER signal output [see Performance monitoring signals] 
— some unknown other stuff 
optionally, FIFO interface logic, for falcon units used as PFIFO engines and some others [see FIFO interface] 


optionally, common memory interface logic [see Memory interface]. However, some engines have their own 
type of memory interface. 


optionally, a cryptographic AES coprocessor. A falcon unit with such coprocessor is called a "secretful" unit. 
[see Cryptographic coprocessor] 


any unit-specific logic the microprocessor is supposed to control 


Todo: 


figure out remaining circuitry 


The base falcon hardware comes in several different revisions: 


version 0: used on G98, MCP77, MCP79 


version 3: used on GT215+, adds a crude VM system for the code segment, edge/level interrupt modes, new 
instructions [division, software traps, bitfield manipulation, ...], and other features 


version 4: used on GF119+ for some engines [others are still version 3]: adds support for 24-bit code addressing, 
debugging and ??? 


version 4.1: used on GK110+ for some engines, changes unknown 


version 5: used on GK208+ for some engines, redesigned ISA encoding 


Todo: 


figure out v4 new stuff 


Todo 


: figure out v4.1 new stuff 


Todo: 


figure out v5 new stuff 


The falcon units present on nvidia cards are: 


The VP3/VP4/VP5 engines [G98 and MCP77:GM107]: 
- PVLD, the variable length decoder 
- PPDEC, the picture decoder 
— PPPP, the video post-processor 

the VP6 engine [GM107-]: 


2.10. 


falcon microprocessor 317 


nVidia Hardware Documentation, Release git 


— PVDEC, the video decoder 
• The VP3 security engine [G98, МСР77, МСР79, GM107-]: 
— PSEC, the security engine 
* The GT215:GK104 copy engines: 
- PCOPY[0] [GT215:GK104] 
— PCOPY[1] [GF100:GK104] 
* The GT215+ daemon engines: 
— PDAEMON [СТ215+] 
— PDISPLAY.DAEMON [GF119+] 
— PUNKIC3 [GF119+] 
* The Fermi PGRAPH CTXCTL engines: 
- PGRAPH.CTXCTL ../graph/gf100-ctxctl/intro.txt 
- PGRAPH.GPC[*].CTXCTL ../graph/gf100-ctxctl/intro.txt 
e PVCOMP, the video compositing engine [MCP89:GF100] 
* PVENC, the H.264 encoding engine [GK104+] 


2.10.2 ISA 


This file deals with description of the ISA used by the falcon microprocessor, which is described in /ntroduction. 


Contents 


* ISA 
- Registers 
ж $flags register 
- $p predicates 
— Instructions 
* Sized 
* Unsized 


— Code segment 


- Invalid opcode handling 


Registers 


There аге 16 32-bit GPRs, $r0-$r15. There are also a dozen or so special registers: 


318 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Index | Name Present on | Description 

$sr0 $ivO all units Interrupt 0 vector 

$г1 $ivl all units Interrupt 1 vector 

$sr3 $tv all units Trap vector 

$sr4 $sp all units Stack pointer 

$sr5 $pc all units Program counter 

$sr6 $xcbase | allunits Code xfer external base 
$sr7 $xdbase | allunits Data xfer external base 
$sr8 $flags all units Misc flags 

$sr9 $сх crypto units | Crypt xfer mode 

$sr10 | $cauth crypto units | Crypt auth code selection 
$sr11 | $xtargets | all units Xfer port selection 
$sr12 | Ststatus v3+ units Trap status 


$flags register 


$flags [9578] register contains various flags controlling the operation of the falcon microprocessor. It is split into the 
following bitfields: 


Bits Name Present on | Description 

0-7 $p0-$p7 | all units General-purpose predicates 
8 c all units Carry flag 

9 о all units Signed overflow flag 

10 S all units Sign/negative flag 

11 7. all units Zero flag 

16 ie0 all units Interrupt 0 enable 

17 iel all units Interrupt 1 enable 

18 ??? v4+ units ??? 

20 150 all units Interrupt O saved enable 
21 181 all units Interrupt 1 saved enable 
22 2?? v4+ units 2?? 

24 ta all units Trap handler active 
2628 | 222 v4+ units 79 

29-31 | 272 v4+ units 22? 


Todo: figure out v4+ stuff 


$p predicates 


$flags.p0-p7 are general-purpose single-bit flags. They can be used to store single-bit variables. They can be set via 
bset, bclr, btgl, and setp instructions. They can be read by xbit instruction, or checked by sleep and bra instructions. 


Instructions 


Instructions have 2, 3, or 4 bytes. First byte of instruction determines its length and format. High 2 bits of the first 
byte determine the instruction's operand size; 00 means 8-bit, 01 means 16-bit, 10 means 32-bit, and 11 means an 
instruction that doesn't use operand sizing. The set of available opcodes varies greatly with the instruction format. 
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The subopcode can be stored in one of the following places: 


e O1: subopcode goes to low 4 bits of byte 0 


* O2: subopcode goes to low 4 bits of byte 1 


* OL: subopcode goes to low 6 bits of byte 1 


* O3: subopcode goes to low 4 bits of byte 2 


The operands are denoted as follows: 


R1x: register encoded in low 4 bits of byte 1 


R2x: register encoded in high 4 bits of byte 1 


R3x: register encoded in high 4 bits of byte 2 


RxS: register used as source 


RxD: register used as destination 


RxSD: register used as both source and destination 


I8: 8-bit immediate encoded in byte 2 


I16: 16-bit immediate encoded in bytes 2 [low part] and 3 [high part] 


Sized 


Sized opcodes are [low 6 bits of opcode]: 


Ox: 
: O1 RID R2S I8 

: O1 RID R2S I16 
: O2 R2S I8 

: O2 R2S 16 

: O2 R2D I8 

: O2 R2SD I8 

: O2 R2SD 116 

: O3 R2S RIS 

: O3 RID R2S 

: O3 R2D RIS 

: O3 R2SD RIS 

: O3 R3D R2S RIS 
: O2 R2SD 


O1 R25 RIS I8 


Todo: 


long call/branch 


The subopcodes are as follows: 
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In- | Ox | 1x | 2x | 30 | 31 | 34] 36 | 37 | 38 39 | 3a | 3b | Зс | За | imm flgO| НоЗ-Су- РгеБФж- 
struc- cles on | scrip- 
tion tion 
st | 0 0 U |- - 1 all | store 
units 
st 1 1 U l- - all | store 
[sp] units 
cmpu 4 14 4 U | CZ} CZ} 1 ай | un- 
units signed 
com- 
pare 
cmps 5 5 5 S | CZ} CZ} 1 all | signed 
units com- 
pare 
cmp 6 |6 6 S | МА COSA v3 com- 
units pare 
add 0 0 0 0 0 0 U | COSXCOSA all | add 
units 
adc 1 1 1 1 1 1 U | COSXCOSA all | add 
units with 
сату 
sub 2 2 2 2 2 2 U | COSXCOSA all | sub4 
units stract 
sbb 3 3 3 3 3 3 U | COSXCOSA all | sub4 
units stract 
with 
bor; 
row 
shl 4 4 4 4 U |С | СОР all | shift 
units left 
shr 5 5 5 5 U |С | СОР all | shift 
units right 
sar 7 7 7 7 U |С | СОР all | shift 
units right 
with 
sign 
ld 8 8 U |- - 1 all | loaq 
units 
shlc c c c c U C |CO$A all | shift 
units left 
with 
сатту 
shrq d d d d U |С | СОР all | shift 
units right 
with 
сағђу 
1а 0 0 U |- - all | load 
[sp] units 
not 0 0 057. 057. 1 all | bit- 
units wise 
not 
neg 1 1 057. 057. 1 all | sign 
units negd- 
tion 
РД. falcon microprocessor : 5 шуы vi 8517 
mov 2 2 N/N - 1 v3+| move 
units 
hswap Ё! Ё! OSZ OSZ 1 all | swab 
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Unsized 


Unsized opcodes are: 
e cx: ОТ RID R2S I8 
* dx: OI R2S RIS I8 
e ex: OI RID R2S I16 
* f0: O2 R2SD I8 
* f1: O2 К25 116 
e f2: O2 R2S I8 
* f4: OL I8 
e f5: OL I16 
* f8: O2 
* 19: O2 К25 
* fa: O3 R2S RIS 
* fc: O2 R2D 
* fd: O3 R2SD RIS 
* fe: O3 RID R2S 
* ff: O3 R3D R2S RIS 


The subopcodes are as follows: 


Instruction | cx | dx | ex | fO | f1 | f2 | f4 | f5 | f8 | f9 | fa | fc | fd | fe | ff | imm | #90 | flg3+ | cycles | Pres 
mulu 0 0 0 010 - - 1 all un 
muls 1 1 1 1 115 - - 1 all un 
sext 2 2 2 210 57 57 1 all un 
extrs 3 3 210 М/А | 57 1 у3--1 
вё 3 3 Н - - 1 all un 
and 4 4 4 14 4 4 10 - COSZ | 1 all un 
or 5 5 5-5 5 510 - СОБ? | 1 all un 
XOr 6 6 6 |6 6 610 - COSZ | 1 all un 
extr 7 7 110 М/А | 57 1 у3--1 
mov 7 17 5 - - 1 all un 
xbit 8 810 - 57 1 all un 
bset 9 U - - 1 all un 
bclr a a U - - 1 all un 
btgl b b U - - 1 all un 
ins b b U N/A | - 1 у3--1 
xbit[fl] с с U - SZ all un 
div c c c | U N/A | - 30-33 v3+ v 
mod d d а N/A | - 30-33 v3+ 0 
22? е e | U - - all un 
iord f f | U - - -1-Х all un 
iowr 0 U - - 1-х all un 
iowrs 1 1 U N/A | - 9-x v3+ v 
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Table 17 — continued from previous page 


Instruction | cx | dx | ex | fO | f1 | f2 | f4 | f5 | f8 | f9 | fa | fc | fd | fe | ff | imm | #90 | flg3+ | cycles | Pres 
xcld 4 - - all un 
xdld 5 - - all un 
xdst 6 - - all un 
setp 8 8 - - all un 
ccmd c |3c | 3c - - сгур! 
bra Ox | Ox 5 - - 5 all un 
bra Ix | Ix S - - 5 all un 
jmp 20 | 20 4 U - - 4-5 all un 
call 21 | 21 5 U - - 4-5 all un 
sleep 28 U - - NA all un 
add [sp] 30 | 30 1 5 - - 1 all un 
bset[fl] 31 9 U - - all un 
belr[fi] 32 a U - - all un 
Бе] 33 b U - - all un 
ret 0 - - 5-6 all un 
iret 1 - - all un 
exit 2 - - all un 
xdwait 3 - - all un 
27? 6 - - all un 
xcwait 7 - - all un 
trap 0 8 N/A | - v3 v 
trap 1 9 N/A | - у3--1 
їтар 2 а N/A | - v3+ v 
trap 3 b N/A | - у3--1 
push 0 - - 1 all un 
itlb 8 N/A | - у3--1 
рор 0 - - 1 all un 
mov[>sr] 0 - - all un 
mov[<sr] 1 - - all un 
ptlb 2 N/A | - у3--1 
vtlb 3 N/A | - v3+ 0 


Code segment 
falcon has separate code and data spaces. Code segment, like data segment, is located in small piece of SRAM in the 
microcontroller. Its size can be determined by looking at MMIO address falcon+0x108, bits 0-8 shifted left by 8. 


Code is byte-oriented, but can only be accessed by 32-bit words from outside, and can only be modified in 0x100-byte 
[page] units. 

On vO, code segment is just a flat piece of RAM, except for the per-page secret flag. See у0 code/data upload registers 
for information on uploading code and data. 


On v3+, code segment is paged with virtual -> physical translation and needs special handling. See JO space for 
details. 


Code execution is started by host via MMIO from arbitrary entry point, and is stopped either by host or by the 
microcode itself, see Halting microcode execution: exit, Processor execution control registers. 
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Invalid opcode handling 


When an invalid opcode is hit, $рс is unmodified and a trap is generated. On v3+, $tstatus reason field is set to 8. vO 
engines don't have $tstatus register, but this is the only trap type they support anyway. 


2.10.3 Arithmetic instructions 


Contents 


* Arithmetic instructions 
- Introduction 
- $flags result bits 
— Pseudocode conventions 
- Comparison: стри, cmps, стр 
— Addition/substraction: add, adc, sub, sbb 
- Shifts: shl, shr, sar, shlc, shrc 
— Unary operations: not, neg, mov, movf, hswap 
— Loading immediates: mov, sethi 
— Clearing registers: clear 
— Setting flags from a value: setf 
- Multiplication: mulu, muls 
— Sign extension: sext 


— Bitfield extraction: extr, extrs 


Bitfield insertion: ins 


Bitwise operations: and, or, xor 


Bit extraction: xbit 


— Bit manipulation: bset, bclr, btgl 


— Division and remainder: div, mod 


— Setting predicates: setp 


Introduction 
The arithmetic/logical instructions do operations on $r0-$r15 GPRs, sometimes setting bits in $flags register according 
to the result. The instructions can be "sized" or *unsized". Sized instructions have 8-bit, 16-bit, and 32-bit variants. 


Unsized instructions don't have variants, and always operate on full 32-bit registers. For 8-bit and 16-bit sized instruc- 
tions, high 24 or 16 bits of destination registers are unmodified. 


$flags result bits 


The $/lags bits often affected by ALU instructions are: 
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* bit 8: c, carry flag. Set by addition instructions iff a carry out of the high bit (or, equivalently, unsigned overflow) 
has occured. Likewise set by subtraction instructions iff a borrow into the high bit (or unsigned overflow) has 
occured. Also used by shift instructions to store the last shifted out bit. Used as the less-than condition in old 
comparisons. 


* bit 9: o, signed overflow flag - set by addition, subtraction, comparison, and negation instructions if a signed 
overflow occured. Set to 0 by some other instructions. 


* bit 10: s, sign flag - set according to the high bit of the result by most arithmetic instructions. 
* bit 11: z, zero flag - set iff the result was equal to 0 by most arithmetic instructions. 


Also, a few ALU instructions operate on $flags register as a whole. 


Pseudocode conventions 


sz, for sized instructions, is the selected size of operation: 8, 16, or 32. 
S (x) evaluates to (x >> (sz - 1) & 1), іе. the sign bit of x. If insn is unsized, assume sz == 32. 


C(a, b, c),wherea, b, care booleans, is the carry flag for an addition where the two inputs have high bits of 
a and b, and the result has a high bit of c. It is computed as follows: 


bool C(bool a, bool b, bool c) { 
// a and b both set - there is always carry out. 
if (a && b) 
return 1; 
// One of a and b is set - there is carry out iff result has high 
// bit 0. 
if ((a || b) && !c) 
return 1; 
# Otherwise (a and b both clear), there is no possibility of carry 
# out. 
return 0; 


Also, !С (а, !b, c) is the borrow flag for a subtraction where the two inputs have high bits of a and р, and the 
result has a high bit of c. 


Likewise, О (а, b, с) is similarly defined as the signed overflow flag for an addition: 


bool O(bool a, bool b, bool с) { 
return a == b && a !- c; 
// equivalent definition (check it yourself): 
// retuzna^ b ^c ^ Cla, b, c); 


Similarly О(а, !b, с) is the signer overflow flag for subtraction. 


Comparison: cmpu, cmps, cmp 


Compare two values, setting flags according to results of comparison. cmp sets the usual set of 4 flags, and behaves 
identically to a subtraction instruction that doesn't write its destination register. cmpu sets only c апа 2, but otherwise 
behaves like cmp - thus it is only useful for unsigned comparisons. cmps sets z normally, but sets c iff SRC1 is 
less then SRC2 when treated as signed number (thus using unsigned condition codes to store the result of a signed 
comparison instead). 
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cmpu/cmps are the only comparison instructions available on Falcon у0. Both of them set only the c and z flags, with 
cmps setting c flag in an unusual way to enable signed comparisons while using unsigned flags and condition codes. 
To do an unsigned comparison, use стро and the unsigned branch conditions [b/a/e]. To do a signed comparison, 
use cmps, also with unsigned branch conditions. 


The Falcon v3+ new cmp instruction sets the full set of flags. To do an unsigned comparison on v3+, use cmp and the 
unsigned branch conditions. To do a signed comparison, use cmp and the signed branch conditions [1/g/e]. 


Instructions: 
Name | Description Present on | Subopcode 
стри | compare unsigned | ай units 4 
cmps compare signed all units 3 
cmp compare v3+ units 6 


Instruction class: sized 
Execution time: 1 cycle 


Operands: SRC1, SRC2 


Forms: 
Form Opcode 
R2, 18 30 
R2,116 | 31 
Ю2,КІ | 38 
Immediates: 
cmpu: zero-extended 
cmps: sign-extended 
cmp: sign-extended 
Operation: 
uint<sz>_t diff = SRCI - SRC2; 
Sflags.z = (diff == 0); 
if (op == cmps) 
Sflags.c = O(S(SRC1), !S(SRC2), S(diff)) ^ S(diff); 
else if (op == cmpu) 
Sflags.c = !C(S(SRC1), !S(SRC2), S(diff)); 
else if (op == cmp) { 
Sflags.c = !C(S(SRC1), !S(SRC2), S(diff)); 
Sflags.o = O(S(SRC1), !S(SRC2), S(diff)); 
Sflags.s = S(diff); 
} 


Addition/substraction: add, adc, sub, sbb 


Add or substract two values, possibly with carry/borrow. The full set of arithmetic flags is always written. 


Instructions: 
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Name | Description Subopcode 
add add 0 
adc add with carry 1 
sub substract 2 
sbb substrace with borrow | 3 
Instruction class: sized 
Execution time: 1 cycle 
Operands: DST, SRC1, SRC2 
Forms: 
Form Opcode 
КІ, R2, I8 10 
ВІ, К2, П6 | 20 
R2, R2, 18 36 
R2, R2, 116 | 37 
R2,R2,R1 | 3b 
R3,R2,R1 | Зс 
Immediates: zero-extended 
Operation: 
uint<sz>_t res; 
if (op == add) 
res = SRC1 + SRC2; 
else if (op == adc) 
res = SRC1 + SRC2 + Sflags.c; 
else if (op == sub) 
res = 5ВС1 - SRC2; 
else if (op == sbb) 


res SRCI SRC2 Sflags.c; 


if (op == add || op == adc) { 
Sflags.c = C(S(SRC1), S(SRC2), S(res)); 
Sflags.o = O(S(SRC1), S(SRC2), S(res)); 

} else { 
$flags.c = !C(S(SRC1), !S(SRC2), S(res)); 


Sflags.o = O(S(SRC1), !S(SRC2), S(res)); 
} 
DST = res; 
Sflags.s 
Sflags.z 


uod 
~ о 
қ ж 
0) 
[^] 
Шы 
о 


Shifts: shl, shr, sar, shlc, shrc 


Shift a value. For sh1/shr, the extra bits "shifted in" are 0. For sar, they're equal to sign bit of source. For shlc/ 
shrc, the first such bit is taken from carry flag, the rest are 0. On Falcon v3+, these instructions set all 4 arithmetic 
flags - s and z are set as usual, o is always set to 0, and c is set to the value of the last shifted out bit, or O if the shift 
count was 0. On Falcon vO, only с is set. 


The shift count is always masked to 3 bits in case of 8-bit shift instructions, 4 bits in case of 16-bit shift instructions, 
and 5 bits in case of 32-bit shift instructions. 
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Instructions: 
Name | Description Subopcode 
shl shift left 4 
shr shift right 5 
Sar shift right with sign bit | 6 
shlc shift left with carry in c 
shrc shift right with carry in | d 
Instruction class: sized 
Execution time: 1 cycle 
Operands: DST, SRCI, SRC2 
Forms: 
Form Opcode 
КІ, R2, I8 10 
R2, R2, I8 36 
В2, К2, КІ | 3b 
R3, К2, ВІ | 3c 
Immediates: truncated 
Operation: 
unsigned shcnt; 
if (sz -- 8) 
shent = SRC2 & 7; 
else if (sz == 16) 
shcnt = SRC2 & Oxf; 
else // sz -- 32 
shcnt = SRC2 & 0х18 р 
uint«sz» t res; 
if (ор == shl || ор == shlc) { 
res = 58С1 << shcnt; 
if (op == shlc && shcnt != 0) 
res |= Sflags.c << (shcnt - 1); 
if (shent == 0) 
Sflags.c = 0; 
else 
Sflags.c = 58С1 >> (sz - shcnt) & 1; 


} 

DST = res; 

if (falcon_version 
Sflags.o = 0; 


) else { // shr, sar, shrc 
res = 58С1 >> shcnt; 


if (ор == shrc && shcnt != 0) 
res |= Sflags.c << (sz - shont); 
if (ор == sar && S(SRC1)) 
res |= «0 << (sz - shent); 
if (shent == 0) 
Sflags.c = 0; 
else 
Sflags.c = 58С1 >> (shont - 1) & 1; 


(continues on next page) 
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(continued from previous page) 


Sflags.s 
Sflags.z 


Unary operations: not, neg, mov, movf, hswap 


not flips all bits in a value. neg negates a value. mov and movf move a value from one register to another. mov is the 
v3+ variant, which just does the move. movf is the vO variant, which additionally sets flags according to the moved 
value. hswap rotates a value by half its size. All instructions except mov set 3 flags: s and z (which are set as usual), 
as well as o (which is set iff signed overflow occured for neg, and always set to 0 for other instructions). 


Instructions: 
Name | Description Present on | Subopcode 
not bitwise complement all units 0 
neg negate a value all units 1 
movf move a value and set flags | vO units 2 
mov move a value v3+ units 2 
hswap | Swap halves all units 3 
Instruction class: sized 
Execution time: 1 cycle 
Operands: DST, SRC 
Forms: 
Form | Opcode 
R1, R2 | 39 
R2, R2 | 3d 
Operation: 
if (op == not) 4 
DST = ~SRC; 
Sflags.o = 0; 
else if (op == neg) { 
DST = -SRC; 
Sflags.o = (DST == 1 << (sz - 1)); 
else if (op == movf) { 
DST = SRC; 
Sflags.o = 0; 
else if (op == mov) { 
DST = SRC; 
else if (op == hswap) { 
DST = SRC >> (sz / 2) | SRC << (sz / 2); 
Sflags.o = 0; 
if (op != mov) { 
Sflags.s = S(DST); 
Sflags.z = (DST == 0); 
} 
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Loading immediates: mov, sethi 


mov sets a register to an immediate. sethi sets high 16 bits of a register to an immediate, leaving low bits untouched. 
mov can be thus used to load small [16-bit signed] immediates, while mov+sethi can be used to load any 32-bit 
immediate. 


Instructions 
Name | Description Subopcode 
mov Load an immediate | 7 
sethi Set high bits 3 


Instruction class: unsized 
Execution time: 1 cycle 


Operands: DST, SRC 


Forms: 
Form Opcode 
R2, I8 fO 
R2, 116 | fl 
Immediates: 
mov: sign-extended 
sethi: zero-extended 
Operation: 
if (op == mov) 
DST = SRC; 
else if (op == sethi) 
DST = DST & Oxffff | SRC << 16; 
Clearing registers: clear 
Sets a register to 0. 
Instructions: 
Name | Description Subopcode 
clear Clear a register | 4 


Instruction class: sized 
Operands: DST 


Forms: 


Form | Opcode 
R2 3d 


Operation: 
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Setting flags from a value: setf 


Sets 2 and s flags according to a value, sets о flag to 0. 


Instructions: 


Name | Description Present on | Subopcode 
setf Set flags according to a value | v3+ units 5 


Instruction class: sized 
Execution time: 1 cycle 


Operands: SRC 


Forms: 
Form | Opcode 
R2 3d 
Operation: 
Sflags.o = 0; 
Sflags.s = S(SRC); 
Sflags.z = (SRC == 0); 


Multiplication: mulu, muls 


Does a 16x16 -> 32 multiplication. The inputs are unsigned for mulu, signed for muls. Sets no flags. 


Instructions: 


Name | Description Subopcode 
mulu Multiply unsigned | 0 
muls Multiply signed 1 


Instruction class: unsized 


Operands: DST, SRC1, SRC2 


Forms: 
Form Opcode 
R1, R2, I8 с0 
КІ, R2, 116 | e0 
R2, R2, I8 fü 
R2, R2, I16 | fl 
R2,R2,RI | fd 
R3,R2,R1 | ff 
Immediates: 
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mulu: zero-extended 
muls: sign-extended 


Operation: 


51 = 55С1 & Oxffff; 
S2 = SRC2 & Oxffff; 


if (op == muls) { 
if (51 & 0x8000) 
51 |= Oxffff0000; 
if (52 & 0x8000) 
52 |= Oxffff0000; 


} 
DST = sl х 52; 


Sign extension: sext 


Does a sign-extension of low (X+1) bits of a value. Sets s and z flags according to the result. The second argument 
is, after masking to 5 bits, the bit index (counting from LSB) which contains the new sign bit - the result will be equal 
to the source with all bits higher than that replaced with a copy of the sign bit. 


Instructions: 


Name | Description | Subopcode 


sext Sign-extend | 2 
Instruction class: unsized 
Execution time: 1 cycle 
Operands: DST, SRC1, SRC2 
Forms: 
Form Opcode 
RI,R2,I8 | cO 
R2, R2,18 |0 
R2, R2, RI | fd 
R3, R2, RI | ff 
Immediates: truncated 
Operation: 
bit = SRC2 & 0х1Ғ; 
if (58С1 & 1 << bit) 1 
DST = 58С1 & ((1 << bit) - 1) | -(1 << bit); 
) else { 
DST = 58С1 & ((1 << bit) - 1); 
} 
Sflags.s = S(DST); 
Sflags.z = (DST == 0); 


332 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Bitfield extraction: extr, extrs 


Extracts a bitfield. The bitfield to extract is given as a pair of (low bit index, size in bits - 1) packed in a single 10-bit 
source, with each part taking 5 bits. The value of the bitfield is returned in the low bits of the destination register. 
extr extracts an unsigned bitfield, setting the remaining destination bits to 0, while ext rs extracts a signed bitfield, 
setting the remaining bits to a copy of the sign bit (ie. the highest bit of the bitfield). 


Both instructions set s and z flags. While z is set as usual, s is set to the "fill" bit used for high bits of the destination 
- thus it is always 0 for extr. 


Instructions: 
Name | Description Present on | Subopcode 
extrs Extract signed bitfield v3+ units 3 
extr Extract unsigned bitfield | v3+ units 7 


Instruction class: unsized 
Execution time: 1 cycle 


Operands: DST, SRC1, SRC2 


Forms: 
Form Opcode 
R1, R2, 18 c0 
R1, R2, 116 | eO 
R3, R2, R1 | ff 
Immediates: zero-extended 
Operation: 
int low = SRC2 & 0х18 р 
int sizeml = (SRC2 >> 5 & Ox1f); 
uint32 t bf = (SRC1 >> low) & ((2 << sizeml) - 1); 
bool fill bit; 
if (op == extr) { 
fill bit - 0; 
) else if (op == extrs) ( 
// depending on the mask is probably a bad idea. 
int signbit = (low + sizeml) & 0х1Ғ; 


fill bit = SRC1 >> signbit & 1; 
} 
if (fill bit) 


bf |= -(2 << sizeml); 
DST = bf; 
Sflags.s = fill_bit; 
$flags.z = (DST == 0); 


Bitfield insertion: ins 


Inserts a bitfield, which is specified like for ext r/extrs. Sets no flags. 


Instructions: 
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Name | Description Present on | Subopcode 
ins Insert a bitfield | v3+ units b 
Instruction class: unsized 
Execution time: 1 cycle 
Operands: DST, SRC1, SRC2 
Forms: 
Form Opcode 
R1, R2, I8 с0 
КІ, R2, 116 | e0 


Immediates: zero-extended. 


Operation: 
low = SRC2 & 0х1; 
size = (SRC2 >> 5 & Oxlf) + 1; 
if (low + size <= 32) { // nop if bitfield out of bounds - I wouldn't depend оп,, 
wit, though... 
DST &= ~(((1 << size) - 1) << low); // clear the current contents of the,, 
—bitfield 
bf = SRC1 & ((1 << size) - 1); 
DST |= bf << low; 
} 


Bitwise operations: and, or, xor 


Ands, ors, or xors two operands. On Falcon vO, sets no flags. On Falcon v3, sets all flags - s and z are set as usual, с 
and o are always set to 0. 


Instructions: 
Name | Description | Subopcode 
and Bitwise and | 4 
or Bitwise or 5 
xor Bitwise xor 6 
Instruction class: unsized 
Execution time: 1 cycle 
Operands: DST, SRC1, SRC2 
Forms: 
Form Opcode 
R1, R2, 18 cO 
КІ, R2, 116 | eO 
R2, R2, I8 fO 
R2, R2, 116 | fl 
R2,R2,RI | fd 
R3,R2,RI | ff 
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Immediates: zero-extended 


Operation: 
if (op == and) 
DST = 58С1 & SRC2; 
if (op == or) 
DST = SRC1 | SRC2; 
if (op == xor) 
DST = SRC1 ^ SRC2; 
if (falcon version 1-0) 4 
Sflags.c = 0; 
Sflags.o = 0; 
Sflags.s = S(DST); 
Sflags.z = (DST == 0); 
} 


Bit extraction: xbit 


Extracts a single bit of a specified register. On Falcon vO, the bit is stored to bit 0 of DST, while other destination bits 
are unmodified, and no flags are set. On Falcon v3+, the bit is stored to bit 0 of DST, all other bits of DST are set to 0, 
s flag is set to 0, and z flag is set iff the extracted bit was 0 (behaving exactly like an extr instruction with size 1). 
In both cases, the bit index is masked off to 5 bits. 


Instructions: 


Name | Description | Subopcode - opcodes cO, ff | Subopcode - opcodes 10, fe 
xbit Extract a bit | 8 c 


Instruction class: unsized 
Execution time: 1 cycle 
Operands: DST, SRC1, SRC2 


Forms: 


Form Opcode 
R1, R2, 18 cO 
R3, R2, RI ff 
R2, $flags, I8 | fO 
КІ, $flags, R2 | fe 


Immediates: truncated 


Operation: 
if (falcon version == 0) ( 
DST = DST & ~1 | (SRC1 >> bit & 1); 
} else { 
DST = SRC1 >> bit & 1; 
Sflags.s = 0; 
Sflags.z = (DST == 0); 
} 
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Bit manipulation: bset, bclr, btgl 


Set, clear, or flip a specified bit of a register. The requested bit index is masked off to 5 bits. No flags are set. 


Instructions: 
Name | Description | Subopcode - opcodes f0, fd, f9 | Subopcode - opcode f4 
bset Set a bit 9 31 
Әсіг Clear a bit a 32 
btgl Flip a bit b 33 


Instruction class: unsized 
Execution time: 1 cycle 


Operands: DST, SRC 


Forms: 
Form Opcode 
R2, I8 fO 
R2, R1 fd 
$flags, I8 | f4 
$flags, R2 | f9 


Immediates: truncated 


Operation: 


bit = SRC & 0х1ғ; 
if (op == bset) 

DST |= 1 << bit; 
else if (op == bcir) 

DST &= ~(1 << bit); 
else // op == btgl 

DST ^= 1 << bit; 


Division and remainder: div, mod 


Does unsigned 32-bit division / modulus. Sets no flags. If a division by 0 is requested, no exception happens - the 
division result is always Oxf ff ff fff in this case, and the modulus result is equal to the first source. 


Instructions: 


Name | Description Present on | Subopcode 
div Divide v3+ units с 
тоа Take modulus | v3+ units d 


Instruction class: unsized 
Execution time: 30-33 cycles 


Operands: DST, SRC1, SRC2 
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Forms: 


Form Opcode 
R1, R2, I8 cO 
R1,R2, 116 | e0 
R3,R2,R1 | ff 


Immediates: zero-extended 


Operation: 
if (SRC2 == 0) { 
dres = Oxffffffff; 
} else { 
dres = SRC1 / SRC2; 
} 
if (op == div) 
DST = dres; 
else // op == mod 
DST = SRC1 - dres х SRC2; 


Setting predicates: setp 


Sets bit #SRC2 in $flags to bit 0 of SRCI. The bit index is masked off to 5 bits. 


Instructions: 
Name | Description | Subopcode 
setp Set predicate | 8 
Instruction class: unsized 
Execution time: 1 cycle 
Operands: SRC1, SRC2 
Forms: 
Form | Opcode 
R2,I8 |f2 
R2, RI | fa 
Immediates: truncated 
Operation: 
bit = SRC2 & Oxlf; 
Sflags = (Sflags & ~(1 << bit)) | (5ЕСІ & 1) << bit; 


2.10.4 Data space 
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Contents 


* Data space 
- Introduction 
— The stack 
— Pseudocode conventions 
— Load: ld 
— Store: st 
— Push onto stack: push 
— Pop from stack: pop 


— Adjust stack pointer: add 


— Accessing data segment through IO 


Todo: document UAS 


Introduction 
Data segment of the falcon is inside the microcontroller itself. Its size can be determined by looking at UC CAPS 
register, bits 9-16 shifted left by 8. 


The segment has byte-oriented addressing and can be accessed in units of 8, 16, or 32 bits. Unaligned accesses are not 
supported and cause botched reads or writes. 


Multi-byte quantities are stored as little-endian. 


The stack 
The stack is also stored in data segment. Stack pointer is stored in $sp special register and is always aligned to 4 bytes. 


Stack grows downwards, with $sp pointing at the last pushed value. The low 2 bits of $sp and bits higher than what's 
needed to span the data space are forced to 0. 


Pseudocode conventions 


SZ, for sized instructions, is the selected size of operation: 8, 16, or 32. 


LD(size, address) returns the contents of size-bit quantity in data segment at specified address: 


int LD(size, addr) { 
if (size == 32) { 
addr &- “3; 
return D[addr] | р[аааг + 1] << 8 | D[addr + 2] << 16 | D[addr + 3] < 
<< 24; 
) else if (size == 16) 1 
addr &= ~1; 
return D[addr] | D[addr + 1] << 8; 
} else { // size == 


(continues on next page) 
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(continued from previous page) 


return D[addr]; 


S T(size, address, value) stores the given size-bit value to data segment: 


void ST(size, addr, val) { 


if (size == 32) { 
if (addr в 1) { // fuck up the written datum as penalty for unaligned, 
access. 
val = (val 8 Oxff) << (addr 8 3) = 8; 
} else if (addr & 2) { 
val = (val & Oxffff) << (addr & 3) х 8; 
} 
addr &= ~3; 
D[addr] = val; 
D[addr + 1] = val >> 8; 
D[addr + 2] = val >> 16; 
D[addr + 3] = val >> 24; 
} else if (size == 16) { 
if (addr & 1) { 
val = (val 8 Oxff) << (addr & 1) » 8; 


} 
addr &= ~1; 


D[addr] = val; 

D[addr + 1] = val >> 8; 
} else { // size == 

D[addr] = val; 


} 


Load: Id 


Loads 8-bit, 16-bit or 32-bit quantity from data segment to register. 


Instructions: 


Name | Description Subopcode - normal | Subopcode - with $sp 
ld Load a value from data segment | 8 0 


Instruction class: sized 


Operands: DST, BASE, IDX 


Forms: 
Form Opcode 
R1, R2, I8 10 
R2,$sp,I8 | 34 
R2, $sp, R1 | За 
R3,R2, RI 3c 


Immediates: zero-extended 


Operation: 
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DST = LD(sz, BASE + IDX » (sz/8)); 


Store: st 


Stores 8-bit, 16-bit or 32-bit quantity from register to data segment. 


Instructions: 


Name | Description Subopcode - normal | Subopcode - with $sp 
st Store a value to data segment | 0 1 


Instruction class: sized 


Operands: BASE, IDX, SRC 


Forms: 
Form Opcode 
R2, I8, ВІ 00 
$sp, 18, R2 | 30 
R2, 0, ВІ 38 
$sp, R1, R2 | 38 


Immediates: zero-extended 


Operation: 


ST (sz, BASE + IDX ж (52/8), SRC); 


Push onto stack: push 


Decrements $sp by 4, then stores a 32-bit value at top of the stack. 


Instructions: 


Name | Description Subopcode 
push Push a value onto stack | 0 


Instruction class: unsized 


Operands: SRC 


Forms: 
Form | Opcode 
R2 f9 
Operation: 
$sp -= 4; 


ST (32, $sp, SRC); 
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Pop from stack: pop 


Loads 32-bit value from top of the stack, then incrments $sp by 4. 


Instructions: 


Name | Description Subopcode 
pop Pops a value from the stack | 0 


Instruction class: unsized 


Operands: DST 


Forms: 
Form | Opcode 
R2 f2 
Operation: 
DST = LD(32, $sp); 
$sp += 4; 
Adjust stack pointer: add 
Adds a value to the stack pointer. 
Instructions: 
Name | Description Subopcode - opcodes 14, f5 | Subopcode - opcode f9 
add Add a value to the stack pointer. | 30 1 


Instruction class: unsized 


Operands: DST, SRC 


Forms: 
Form Opcode 
$sp,I8 | f4 
58р, 116 | f5 
$р, R2 | f9 


Immediates: sign-extended 


Operation: 


$sp += SRC; 


Accessing data segment through IO 


On v3+, the data segment is accessible through normal IO space through index/data reg pairs. The number of available 
index/data pairs is accessible by UC САР52 register. This number is equal to 4 on PDAEMON, | on other engines: 
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MMIO 0x1c0 + i * 8/1[0x07000 + i * 0x200]: РАТА INDEX Selects the place in ОП accessed by DATA reg. Bits: 
* bits 2-15: bits 2-15 of the data address to poke 


* bit 24: write autoincrement flag: if set, every write to corresponding DATA register increments the address 
by4 


* bit 25: read autoincrement flag: like 24, but for reads 


MMIO 0х1с4 + i * 8/1[0x07100 + i * 0x200]: DATA Writes execute ST(32, DATA, INDEX & Oxfffc, value); and 
increment the address if write autoincrement is enabled. Reads return the result of LD(32, DATA INDEX & 
Oxfffc); and increment if read autoincrement is enabled. 


i should be less than DATA, PORTS value from UC САР52 register. 


On vO, the data segment is instead accessible through the high falcon MMIO range, see v0 code/data upload registers 
for details. 


2.10.5 Branch instructions 


Contents 


* Branch instructions 
— Introduction 
- $рс register 


Pseudocode conventions 


Conditional branch: bra 


— Unconditional branch: jmp 


— Subroutine call: call 


Subroutine return: ret 


Todo: document ljmp/lcall 


Introduction 


The flow control instructions on Falcon include conditional relative branches, unconditional absolute branches, ab- 
solute calls, and returns. Calls use the stack in data segment for storage for return addresses [see The stack]. The 
conditions available for branching are based on the low 12 bits of $flags register: 


e bits 0-7: рО-р7, general-purpose predicates 
* bit 8: c, carry flag 

* bit 9: o, signed overflow flag 

* bita: s, sign flag 

* bit b: z, zero flag 


с, о, 8, Z flags are automatically set by many ALU instructions, рО-р7 have to be explicitely manipulated. See $flags 
result bits for more details. 
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When a branching instruction is taken, the execution time is either 4 or 5 cycles. The execution time depends on the 
address of the next instruction to be executed. If this instruction can be loaded in one cycle (the instruction is contained 
in a single aligned 32-bit memory block in the code section), 4 cycles will be necessary. If the instruction is split in 
two blocks, 5 cycles will then be necessary. 


$pc register 


Address of the current instruction is always available through the read-only $pc special register. 


Pseudocode conventions 


$pc is usually automatically incremented by opcode length after each instruction - documentation for other kinds of 
instructions doesn't mention it explicitely for each insn. However, due to the nature of this category of instructions, 
all effects on $pc are mentioned explicitely in this file. 


oplen is the length of the currently executed instruction in bytes. 
See also conventions for «data instructions. 
Conditional branch: bra 


Branches to a given location if the condition evaluates to true. Target is $pc-relative. 


Instructions: 

Name Description Present on | Subopcode 
bra pX if predicate true all units 00-Х 
bra c if carry all units 08 
bra b if unsigned below all units 08 
bra o if overflow all units 09 
bra s if sign set / negative all units 0a 
braz if zero all units Ob 
bra е if equal all units Ob 
bra a if unsigned above all units Oc 
bra na if not unsigned above all units Od 
bra be if unsigned below or equal | all units Od 
bra always all units Oe 
bra npX | if predicate false all units 10+X 
bra ne if not carry all units 18 
bra nb if not unsigned below all units 18 
bra ae if unsigned above or equal | all units 18 
bra no if not overflow all units 19 
bra ns if sign unset / positive all units la 
bra nz if not zero all units 1b 
bra ne if not equal all units 10 
brag if signed greater v3+ units 1с 
bra le if signed less or equal v3+ units 14 
bra | if signed less v3+ units le 
bra ge if signed greater or equal v3+ units If 


Instruction class: unsized 


Execution time: 1 cycle if not taken, 4-5 cycles if taken 
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Operands: DIFF 


Forms: 


Immediates: sign-extended 


Form | Opcode 
I8 f4 
I16 f5 


Operation: 
switch (сс) 
case $pX: // 
cond 
brea 
case c: 
cond 
brea 
case o: 
cond 
brea 
case s: 
cond 
brea 
case z: 
cond 
brea 
case a: 
cond 
brea 
case na: 
cond 
brea 
case (none): 
cond 
brea 
case not $pX: 
cond 
brea 
case noc: 
cond 
brea 
case no: 
cond 
brea 
case ns: 
cond 
brea 
case nz: 
cond 
brea 
case g: 
cond 
brea 
case le: 
cond 
brea 


$p0..$p7 
= S$Sflags. 


= Sflags. 


= Sflags. 


= Sflags. 


= Sflags. 


= !Sflags 


= Sflags. 


= 1; 


// Spo.. 
= !Sflags 


= (Sflags 


= !Sflags. 


= !Sflags. 


= !Sflags. 


= !Sflags. 


$pX; 


5; 


.c && !$flags.z; 


c || Sflags.z; 


$p7 
.брх; 


с; 


о; 


57 


24 


л 


е) 


= !(Sflags.o ^ Sflags.s) && !$flags.z; 


Sflags.s) || S$flags.z; 


(continues on next page) 
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(continued from previous page) 


case 1: 
cond = $flags.o ^ S$flags.s; 
break; 
case ge: 
сопа = !($flags.o ^ S$flags.s); 
break; 
} 
if (cond) 
Spc = Spc + DIFF; 
else 
$pc = 5рс + oplen; 


Unconditional branch: jmp 


Branches to the target. Target is specified as absolute address. Yes, the immediate forms are pretty much redundant 
with the relative branch form. 


Instructions: 


Name | Description Subopcode - opcodes 14, f5 | Subopcode - opcode f9 
jmp Unconditional jump | 20 4 


Instruction class: unsized 
Execution time: 4-5 cycles 


Operands: TRG 


Forms: 
Form | Opcode 
18 f4 
I16 f5 
R2 f9 


Immediates: zero-extended 


Operation: 


5рс = TRG; 


Subroutine call: call 


Pushes return address onto stack and branches to the target. Target is specified as absolute address. 


Instructions: 


Name | Description Subopcode - opcodes f4, f5 | Subopcode - opcode f9 
call Call a subroutine | 21 5 


Instruction class: unsized 


Execution time: 4-5 cycles 
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Operands: TRG 


Forms: 
Form | Opcode 
I8 f4 
116 f5 
R2 f9 
Immediates: zero-extended 
Operation: 
$sp -= 4; 
ST(32, Ssp, Spc + oplen); 
Spc = TRG; 
Subroutine return: ret 
Returns from a previous call. 
Instructions: 
Name | Description Subopcode 
ret Return from a subroutine | 0 
Instruction class: unsized 
Execution time: 5-6 cycles 
Operands: [none] 
Forms: 
Form Opcode 
[no operands] | f8 
Operation: 


Spc = LD(32, $5р); 
бар += 4; 


2.10.6 Processor control 


Contents 


* Processor control 
— Introduction 


— Execution state 


* The EXIT interrupt 
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* Halting microcode execution: exit 
* Waiting for interrupts: sleep 
* Processor execution control registers 


— Accessing special registers: mov 


— Processor capability readout 


Todo: write me 


Introduction 


Todo: write me 


Execution state 


The falcon processor can be in one of three states: 
* RUNNING: processor is actively executing instructions 
* STOPPED: no instructions are being executed, interrupts are ignored 
* SLEEPING: no instructions are being executed, but interrupts can restart execution 


The state can be changed as follows: 


From To Cause 
any STOPPED | Reset [non-crypto] 
any RUNNING | Reset [crypto] 


STOPPED | RUNNING | Start by UC CTRL 
RUNNING | STOPPED | Exit instruction 
RUNNING | STOPPED | Double trap 
RUNNING | SLEEPING | Sleep instruction 
SLEEPING | RUNNING | Interrupt 


The EXIT interrupt 


Whenever falcon execution state is changed to STOPPED for any reason other than reset (exit instruction, double trap, 
or the crypto reset scrubber finishing), falcon interrupt line 4 is active for one cycle (triggering the EXIT interrupt if 
it’s set to level mode). 


Halting microcode execution: exit 


Halts microcode execution, raises EXIT interrupt. 


Instructions: 
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Name | Description Subopcode 
exit Halt microcode execution | 2 
Instruction class: unsized 
Operands: [none] 
Forms: 
Form Opcode 
[no operands] | f8 
Operation: 
EXIT; 


Waiting for interrupts: sleep 


If the $/lags bit given as argument is set, puts the microprocessor in sleep state until an unmasked interrupt is received. 
Otherwise, is а nop. If interrupted, return pointer will point to start of the sleep instruction, restarting it if the $flags 
bit hasn't been cleared. 


Instructions: 


Name | Description Subopcode 
sleep Wait for interrupts | 28 


Instruction class: unsized 
Operands: FLAG 


Forms: 


Form | Opcode 
I8 f4 


Operation: 


if (Sflags & 1 «« FLAG) 
state = SLEEPING; 


Processor execution control registers 


Todo: write me 


Accessing special registers: mov 


Todo: write me 
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Processor capability readout 


Todo: write me 


2.10.7 Code virtual memory 


Contents 


* Code virtual memory 


— Introduction 


TLB operations: PTLB, VTLB, ITLB 
* Executing TLB operations through IO 
* TLB readout instructions: ptlb, vtlb 


* TLB invalidation instruction: itlb 


VM usage on code execution 


— Code upload and peeking 


Introduction 
On v3+, the falcon code segment uses primitive paging/VM via simple reverse page table. The page size is 0х100 
bytes. 


The physical<->virtual address mapping information is stored in hidden TLB memory. There is one TLB cell for each 
physical code page, and it specifies the virtual address corresponding to it + some flags. The flags are: 


* bit 0: usable. Set if page is mapped and complete. 
* bit 1: busy. Set if page is mapped, but is still being uploaded. 


* bit 2: secret. Set if page contains secret code. [see Cryptographic coprocessor] 


Todo: check interaction of secret / usable flags and entering/exitting auth mode 


A TLB entry is considered valid if any of the three flags is set. Whenever a virtual address is accessed, the TLBs 
are scanned for a valid entry with matching virtual address. The physical page whost TLB matched is then used to 
complete the access. It's an error if no page matched, or if there's more than one match. 


The number of physical pages in the code segment can be determined by looking at UC CAPS register, bits 0-8. 
Number of usable bits in virtual page index can be determined by looking at UC CAPS2 register, bits 16-19. Ie. valid 
virtual addresses of pages are 0.. (1 << (UC CAPS2[16:19])) * 0x100. 


The TLBs can be modified/accessed in 6 ways: 
* executing code - reads TLB corresponding to current $pc 
* PTLB - looks up TLB for a given physical page 
* VTLB - looks up TLB for a given virtual page 
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* ITLB - invalidates TLB of a given physical page 
* uploading code via IO access window 
* uploading code via xfer 


We'll denote the flags of TLB entry of physical page i as TLB[i].flags, and the virtual page index as TLB[i].virt. 


TLB operations: PTLB, VTLB, ITLB 


These operations take 24-bit parameters, and except for ITLB return a 32-bit result. They can be called from falcon 
microcode as instructions, or through IO ports. 


ITLB(physidx) clears the TLB entry corresponding to a specified physical page. The page is specified as page index. 
ITLB, however, cannot clear pages containing secret code - the page has to be reuploaded from scratch with non-secret 
data first. 


void ITLB(b24 physidx) { 

if (!(TLB[physidx].flags & 4)) 
TLB[physidx].flags = 0; 
TLB[physidx].virt = 0; 


{ 


} 


PTLB(physidx) returns the TLB of a given physical page. The format of the result is: 
* bits 0-7: 0 
* bits 8-23: virtual page index 
* bits 24-26: flags 
* bits 27-31: 0 


b32 PTLB(b24 physidx) { 
return TLB[physidx].flags «« 24 | TLB[physidx].virt «« 8; 
} 


VTLB(virtaddr) returns ће TLB that covers a given virtual address. The result is: 
* bits 0-7: physical page index 
* bits 8-23: 0 
* bits 24-26: flags, ORed across all matches 
• bit 30: set if 21 TLB matches [multihit error] 


* bit 31: set if no TLB matches [no hit error] 


b32 VTLB(b24 virtaddr) 4 


phys - 0; 
flags = 0; 
matches - 0; 


for (1 = 0; 1 < UC CAPS.CODE PAGES; 1++) { 


if (TLB[i].flags && TLB[i].virt == (virtaddr >> 8 & ((1 << UC САР52. 


-УМ PAGES 1062) - 1))) { 
flags |- TLB[i].flags; 
phys - i; 
matches++; 


(continues on next page) 
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(continued from previous page) 


} 
res = phys | flags << 24; 
if (matches == 0) 


res |= 0x80000000; 
if (matches » 1) 
res |= 0x40000000; 


return res; 


Executing TLB operations through IO 


The three *TLB operations сап be executed by poking TLB_CMD register. For PTLB and VTLB, the result will then 
be visible in TLB. CMD RES register: 


MMIO 0x140 / I[0x05000]: TLB_CMD Runs a given TLB command on write, returns last value written on read. 
* bits 0-23: Parameter to the TLB command 
* bits 24-25: TLB command to execute 
-1 ITLB 
- 2: PTLB 
- 3: VTLB 


MMIO 0x144 / I[0x05100]: Т.В СМО RES Read-only, returns the result of the last PTLB ог VTLB operation 
launched through TLB_CMD. 


TLB readout instructions: ptlb, vtlb 


These instructions run the corresponding TLB readout commands and return their results. 


Instructions: 
Name | Description Present on | Subopcode 
ptlb run PTLB operation | v3+ units 2 
vtlb run VTLB operation | v3+ units 3 


Instruction class: unsized 


Operands: DST, SRC 


Forms: 
Form | Opcode 
RLR2 | fe 
Operation: 
if (op == ptlb) 
DST = PTLB(SRC); 
else 
DST = VTLB(SRC) ; 
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TLB invalidation instruction: itlb 


This instructions runs the ITLB command. 


Instructions: 


Name | Description Present on | Subopcode 
itlb run ITLB operation | v3+ units 8 


Instruction class: unsized 


Operands: SRC 


Forms: 
Form | Opcode 
R2 f9 
Operation: 
ITLB(SRC); 


VM usage on code execution 


Whenever instruction fetch is attempted, the VTLB operation is done on fetch address. If it returns no-hit or multihit 
error, a trap is generated and the $tstatus reason field is set to Oxa [for no-hit] or Oxb [for multihit]. Note that, if the 
faulting instruction happens to cross a page bounduary and the second page triggered a fault, the $pc register saved in 
$tstatus wiill not point to the page that faulted. 


If no error was triggered, flag 0 [usable] is checked. If it's set, the access is finished using the physical page found by 
VTLB. If usable isn't set, but flag 1 [busy] is set, the fetch is paused and will be retried when TLBs are modified in 
any way. Otherwise, flag 2 [secret] must be the only flag set. In this case, a switch to authenticated mode is attempted 
- see Cryptographic coprocessor for details. 


Code upload and peeking 


Code can be uploaded in two ways: direct upload via a window in IO space, or by an xfer [see Code/data xfers to/from 
external memory]. The IO registers relevant are: 


MMIO 0x180 / I[0x06000]: CODE INDEX Selects the place in code segment accessed by CODE reg. 


bits 2-15: bits 2-15 of the physical code address to poke 


bit 24: write autoincrement flag: if set, every write to corresponding CODE register increments the address 
by4 


bit 25: read autoincrement flag: like 24, but for reads 


bit 28: secret: if set, will attempt a switch to secret lockdown on next CODE write attempt and will mark 
uploaded code as secret. 


bit 29: secret lockdown [RO]: if set, currently in secret lockdown mode - CODE INDEX cannot be mod- 
ified manually until a complete page is uploaded and will auto-increment on CODE writes irrespective of 
write autoincrement flag. Reads will fail and won't auto-increment. 


bit 30: secret fail [RO]: if set, entering secret lockdown failed due to attempt to start upload from not page 
aligned address. 
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* bit 31: secret reset scrubber active [RO]: if set, the window isn't currently usable because the reset scrubber 
is busy. 


See Cryptographic coprocessor for the secret stuff. 


MMIO 0x184 / I[0x06100]: CODE Writes execute CST(CODE INDEX & Oxfffc, value); and increment the address 


if write autoincrement is enabled or secret lockdown is in effect. Reads return the contents of code segment at 
physical address CODE INDEX & Oxfffc and increment if read autoincrement is enabled and secret lockdown 
is not in effect. Attempts to read from physical code pages with the secret flag will return Oxdead5ec1 instead of 
the real contents. The values read/written are 32-bit LE numbers corresponding to 4 bytes in the code segment. 


ММОПО 0х188/ I[0x06200]: CODE VIRT Selects the virtual page index for uploaded code. The index is sampled 


when writing word 0 of each page. 


CST is defined thus: 


void CST(addr, value) { 
physidx = addr >> 8; 


// if secret lockdown needed for the page, but starting from non-0 address,,, 


fail. 


if ((addr & Oxfc) != 0 && (CODE INDEX.secret || TLB[physidx] & 4) && !CODE. 


—INDEX.secret lockdown) 
CODE INDEX.secret fail - 1; 


if (CODE INDEX.secret fail || CODE INDEX.secret scrubber active) ( 
// nothing. 
) else 1 
enter lockdown - 0; 
exit lockdown = 0; 
if ((addr & Oxfc) == 0) { 
// if first word uploaded... 
if (CODE INDEX.secret || TLB[physidx].flags 6 4) 4 


// if uploading secret code, or uploading code Ко, 
replace secret code, nter lockdown 
enter lockdown = 1; 


// store virt addr 

TLB[physid].virt - CODE VIRT; 

// clear usable flag, set busy flag 
TLB[physid].flags = 2; 

if (CODE INDEX.secret) 


TLB[physid].flags |= 4; 
} 
code[addr] = value; // write 4 bytes to code segment 
if ((addr 4 Oxfc) == 0хЁс) 1 
// last word uploaded, page now complete. 
exit lockdown = 1; 


// clear busy, set usable or secret 
if (CODE INDEX.secret) 


TLB[physid].flags = 4; 
else 
TLB[physid].flags = 1; 
) 
if (CODE INDEX.write autoincrement || CODE INDEX.secret lockdown) 
addr += 4; 
if (enter lockdown) 
CODE INDEX.secret lockdown - 1; 


if (exit lockdown) 
CODE INDEX.secret lockdown = 0; 
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In summary, to upload a single page of code: 
1. Set CODE INDEX to physical addr | 0x1000000 [and | 0x10000000 if uploading secret code] 
2. Set CODE VIRT to virtual page index it should be mapped at 
3. Write 0x40 words to CODE 


Uploading code via xfers will set TLB[physid].virt = ext offset >> 8 and TLB[physid].flags = (secret ? 6 : 2) right 
after the xfer is started, then set TLB[physid].flags = (secret ? 4 : 1) when it's complete. See Code/data xfers to/from 
external memory for more information. 


2.10.8 Interrupts 


Contents 


* [nterrupts 
- Introduction 


- Interrupt status and enable registers 


Interrupt mode setup 


Interrupt routing 


- Interrupt delivery 


Trap delivery 


Returning form an interrupt: iret 


— Software trap trigger: trap 


Introduction 


falcon has interrupt support. There are 16 interrupt lines on each engine, and two interrupt vectors on the micropro- 
cessor. Each of the interrupt lines can be independently routed to one of the microprocessor vectors, or to the PMC 
interrupt line, if the engine has one. The lines can be individually masked as well. They can be triggered by hw events, 
or by the user. 


The lines are: 


Line | v3+ type Name Description 
edge PERIODIC periodic timer 
1 ейге WATCHDOG | watchdog timer 
2 level FIFO FIFO data available 
3 edge CHSW PFIFO channel switch 
4 edge EXIT processor stopped 
5 edge 992 [related to falcon+0x0a4] 
6-7 ейге 5СКАТСН scratch [unused by hw, user-defined] 
8-9 edge by default | - engine-specific 
10-15 | level by default | - engine-specific 
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Todo: figure out interrupt 5 


Each interrupt line has a physical wire assigned to it. For edge-triggered interrupts, there's a flip-flop that's set by 
0-to-1 edge on the wire or a write to INTR SET register, and cleared by writing to INTR CLEAR register. For 
level-triggered interrupts, interrupt status is wired straight to the input. 


Interrupt status and enable registers 


The interrupt and interrupt enable registers are actually visible as set/clear/status register triples: writing to the set 
register sets all bits that are 1 in the written value to 1. Writing to clear register sets them to 0. The status register 
shows the current value when read, but cannot be written. 


MMIO 0x000 / I[0x00000]: INTR SET 
MMIO 0x004 / I[0x00100]: INTR CLEAR 
MMIO 0x008 / I[0x00200]: INTR [status] 
A mask of currently pending interrupts. Write to SET to manually trigger 
an interrupt. Write to CLEAR to ack an interrupt. Attempts to SET or CLEAR 
level-triggered interrupts are ignored. 


MMIO 0x010 / I[0x00400]: INTR EN SET 
MMIO 0х014 / I[0x00500]: INTR EN CLEAR 
MMIO 0x018 / I[0x00600]: INTR EN [status] 


A mask of enabled interrupts. If a bit is set to 0 here, the interrupt 
handler isn't run if a given interrupt happens [but the INTR bit is still 
set and it'll run once INTR EN bit is set again]. 


Interrupt mode setup 

MMIO 0х00с / I[0x00300]: INTR. MODE [v3+ only] Bits 0-15 are modes for the corresponding interrupt lines. 0 
is edge trigered, 1 is level triggered. 
Setting a sw interrupt to level-triggered, or a hw interrupt to mode it wasn't meant to be set is likely a bad idea. 


This register is set to Oxfc04 on reset. 


Todo: check edge/level distinction on vO 


Interrupt routing 


MMIO 0x01c / 1[0x00700]: INTR ROUTING 
* bits 0-15: bit 0 of interrupt routing selector, one for each interrupt line 
* bits 16-31: bit 1 of interrupt routing selector, one for each interrupt line 
For each interrupt line, the two bits from respective bitfields are put together to find its routing destination: 
* 0: falcon vector 0 
* 1: PMC HOST/DAEMON line 
* 2: falcon vector 1 


* 3: PMC NRHOST line [GF100+ selected engines only] 
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If the engine has a PMC interrupt line and any interrupt set for PMC irq delivery is active and unmasked, the corre- 
sponding PMC interrupt input line is active. 


Interrupt delivery 


falcon interrupt delivery is controlled by $ivO, $iv1 registers and 160, 161, 150, 151 $flags bits. $ivO is address of interrupt 
vector 0. $iv1 is address of interrupt vector 1. ieX are interrupt enable bits for corresponding vectors. isX are interrupt 
enable save bits - they store previous status of ieX bits during interrupt handler execution. Both ieX bits are always 
cleared to 0 when entering an interrupt handler. 


Whenever there's an active and enabled interrupt set for vector X delivery, and ieX flag is set, vector X is called: 


$sp -- 4; 
51(32, $5р, $рс); 
Sflags.isO = $flags.ie0; 


Sflags.isl = S$flags.iel; 
Sflags.ieO = 0; 
Sflags.iel = 0; 


if (falcon_version >= 4) { 
Sflags.unk16 = S$flags.unk12; 
Sflags.unkld Sflags.unkla; 
Sflags.unk12 07 


Ї 
if (vector 0) 
Spc = Siv0; 


else 


Spc Sivl; 


Trap delivery 
falcon trap delivery is controlled by $tv, $tstatus registers and ta $flags bit. Traps behave like interrupts, but are 
triggered by events inside the UC. 


$tv is address of trap vector. ta is trap active flag. $tstatus is present on v3+ only and contains information about last 
trap. The bitfields of $tstatus are: 


* bits 0-19 [or as many bits as required]: faulting $pc 
» bits 20-23: trap reason 


The known trap reasons are: 


Reason | Name Description 

0-3 SOFTWARE software trap 

8 INVALID_OPCODE | invalid opcode 

Oxa VM_NO_HIT page fault - no hit 
Oxb VM. MULTI HIT page fault - multi hit 
Oxf BREAKPOINT breakpoint hit 


Whenever a trapworthy event happens on the uc, a trap is delivered: 


if (Sflags.ta) ( // double trap? 
EXIT; 


} 
Sflags.ta = 1; 


(continues on next page) 
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(continued from previous page) 


if (falcon version != 0) // on v0, 


there's only one possible trap reason anyway 


Ststatus = Spc | reason << 20; 


if (falcon_version >= 4) { 
Sflags.isO = $flags.ie0; 
Sflags.isl = S$flags.iel; 


Sflags.unk16 = $flags.unk12; 


Sflags.unkld = 

Sflags.ieO = 0; 

Sflags.iel = 0; 

Sflags.unk12 = 0; 
} 


$sp -- 4; 
ST(32, Ssp, Spc); 
Spe = Stv; 


S$flags.unkla; 


Todo: didn't ieX -> isX happen before v4? 


Returning form an interrupt: iret 


Returns from an interrupt handler. 


Instructions: 


Name 


Description 


Subopcode 


iret 


Return from an interrupt | 1 


Instruction class: unsized 
Operands: [none] 


Forms: 


Operation: 


Form 


Opcode 


[no operands] 


f8 


Spc = LD(32, $sp); 

$sp += 4; 

$flags.ie0 = $flags.is0; 
Sflags.iel = $flags.isl; 
if (falcon_version >= 4) { 


Sflags.unk12 = S$flags.unk16; 
Sflags.unkla = $flags.unklg; 


Software trap trigger: trap 


Triggers a software trap. 
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Instructions: 
Name | Description Present on | Subopcode 
тар О | software trap #0 | v3+ units 8 
trap 1 software trap #1 | v3+ units 9 
trap2 | software trap #2 | v3+ units a 
trap3 | software trap #3 | v3+ units b 


Instruction class: unsized 
Operands: [none] 


Forms: 


Form Opcode 
[no operands] | f8 


Operation: 


$pc += oplen; // return will be to the insn after this one 
TRAP (arg); 


2.10.9 Code/data xfers to/from external memory 


Contents 


* Code/data xfers to/from external memory 
- Introduction 
— xfer special registers 
- Submitting xfer requests: xcld, xdld, хая 
— Waiting for xfer completion: xcwait, xdwait 


- Submitting xfer requests via IO space 


— xfer queue status registers 


Introduction 


The falcon has a builtin DMA controller that allows running asynchronous copies beteween falcon data/code segments 
and external memory. 


An xfer request consists of the following: 


mode: code load [external -> falcon code], data load [external -> falcon data], or data store [falcon data -> 
external] 


external port: 0-7. Specifies which external memory space the xfer should use. 


external base: 0-Oxffffffff. Shifted left by 8 bits to obtain the base address of the transfer in external memory. 


external offset: O-Oxffffffff. Offset in external memory, and for у3-- code segments, virtual address that code 
should be loaded at. 
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e local address: 0-Oxffff. Offset in falcon code/data segment where data should be transferred. Physical address 
for code xfers. 


* xfer size: 0-6 for data xfers, ignored for code xfers [always effectively 6]. The xfer copies (4<<size) bytes. 


* secret flag: Secret engines code xfers only. Specifies if the xfer should load secret code. 


Todo: one more unknown flag on secret engines 


Note that xfer functionality is greatly enhanced on secret engines to cover copying data to/from crypto registers. See 
Cryptographic coprocessor for details. 


xfer requests can be submitted either through special falcon instructions, or through poking IO registers. The requests 
are stored in a queue and processed asynchronously. 


A data load xfer copies (4<<$size) bytes from external memory port $port at address ($ехі base << 8) + Фехі offset 
to falcon data segment at address $local, address. external offset and local address have to be aligned to the xfer size. 


A code load xfer copies 0x100 bytes from external memory port $port at address (Sext base << 8) + $ext_offset to 
falcon code segment at physical address $local address. Right after queuing the transfer, the code page is marked 
“busy” and, for v3+, mapped to virtual address $ехі offset. If the secret flag is set, it'll also be set for the page. When 
the transfer is finished, The page flags are set to "usable" for non-secret pages, or “secret” for secret pages. 


xfer special registers 
There аге 3 falcon special registers that hold parameters for uc-originated xfer requests. $xdbase stores ext. base for 
data loads/stores, $xcbase stores ext. base for code loads. $xtargets stores the ports for various types of xfer: 

* bits 0-2: port for code loads 

* bits 8-10: port for data loads 

* bits 12-14: port for data stores 


The external memory that falcon will use depends on the particular engine. See ../graph/gf100-ctxctl/memif.txt for 
GF100 PGRAPH CTXCTLs, Memory interface for the other engines. 


Submitting xfer requests: xcld, xdld, xdst 


These instruction submit xfer requests of the relevant type. ext base and port are taken from $xdbase/$xcbase and 
$xtargets special registers. ext offset is taken from first operand, local address is taken from low 16 bits of second 
operand, and size [for data xfers] is taken from bits 16-18 of the second operand. Secret flag is taken from $cauth bit 
16. 


Instructions: 


Name | Description | Subopcode 
xcld code load 4 
xdld data load 5 
xdst data store 6 


Instruction class: unsized 


Operands: SRC1, SRC2 
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Forms: 
Form | Opcode 
R2,R1 | fa 
Operation: 
if (op == xcld) 
XFER (mode-code load, port=Sxtargets[0:2], ext_base=Sxcbase, 
ext_offset=SRC1, local_address=(SRC2é0xffff), 
secret-($cauth[16:16])); 
else if (op == xdld) 


XFER (mode-data load, port=Sxtargets[8:10], xt base-$xdbase, 
ext offset-SRC1l, local address-(SRC2&O0xffff), 
size=(SRC2>>16) ); 


else // xdst 
XFER (mode-data store, port=Sxtargets[12:14], ext_base=Sxdbase, 
ext offset-SRC1l, local address-(SRC2&O0xffff), 
size=(SRC2>>16) ); 


Waiting for xfer completion: xcwait, xdwait 


These instructions wait until all xfers of the relevant type have finished. 


Instructions: 
Name | Description Subopcode 
xdwait | wait for all data loads/stores to finish | 3 
xcwait | wait for all code loads to finish 7 
Instruction class: unsized 
Operands: [none] 
Forms: 
Form Opcode 
[no operands] | f8 
Operation: 
if (op == xcwait) 
while (XFER_ACTIVE (mode=code_load) ); 
else 
while (XFER АСТТУЕ (mode=data_load) || XFER, ACTIVE (mode-data, store)); 


Submitting xfer requests via ІО space 


There are 4 IO registers that can be used to manually submit xfer reugests. The request is sent out by writing 
XFER CTRL register, other registers have to be set beforehand. 


MMIO 0x110 / I[0x04400]: XFER EXT BASE Specifies the ext base for the xfer that will be launched by 
XFER CTRL. 
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MMIO 0х114 / Ц0х04500]: XFER LOCAL ADDRESS Specifies the local address for the xfer that will be 
launched by ХЕЕК СТКІ. 


MMIO 0x118 / 1[0x04600]: ХЕЕК СТКІ, Writing requests a new xfer with given params, reading shows the last 
value written + two status flags 


e bit 0: pending [RO]: The last write to ХЕЕК CTRL is still waiting for place іп the queue. XFER. СТКІ. 
shouldn't be written until this bit clears. 


* bit 1: 222 [RO] 
* bit 2: secret flag [secret engines only] 
* bit 3: 222 [secret engines only] 
* bits 4-5: mode 
— 0: data load 
— 1: code load 
— 2: data store 
* bits 8-10: size 


* bits 12-14: port 


Todo: figure out bit 1. Related to 0x10c? 


MMIO 0х11с / П0х04700]: ХЕЕК EXT OFFSET Specifies the ext offset for the xfer that will be launched by 
XFER CTRL. 


Todo: how to wait for xfer finish using only IO? 


xfer queue status registers 


The status of the xfer queue can be read out through an IO register: 
MMIO 0x120 / I[0x04800]: ХЕЕК STATUS 

* bit 1: busy. 1 if any data xfer is pending. 

* bits 4-5: ??? writable 

* bits 16-18: number of data stores pending 


* bits 24-26: number of data loads pending 


Todo: bits 4-5 


Todo: RE and document this stuff, find if there's status for code xfers 
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2.10.10 IO space 


Contents 


* IO space 
— Introduction 


— Common IO register list 


Scratch registers 


Engine status and control registers 


— v0 code/data upload registers 


IO space writes: iowr, iowrs 


IO space reads: iord 


Introduction 


Every falcon engine has an associated IO space. The space consists of 32-bit IO registers, and is accessible in two 
ways: 


* host access by MMIO areas in BARO 
* falcon access by io* instructions 


The IO space contains control registers for the microprocessor itself, interrupt and timer setup, code/data space access 
ports, PFIFO communication registers, as well as registers for the engine-specific hardware that falcon is meant to 
control. 


The addresses are different between falcon and host. From falcon POV, the IO space is word-addressable 0x40000- 
byte space. However, most registers are duplicated 64 times: bits 2-7 of the address are ignored. The few registers that 
don't ignore these bits are called “indexed” registers. From host POV, the falcon IO space is a 0x1000-byte window іп 
BARO. Its base address is engine-dependent. First Oxf00 bytes of this window are tied to the falcon IO space, while 
last 0х100 bytes contain several host-only registers. On G98:GF119, host mmio address falcon_base + X is directed 
to falcon IO space address X << 6 | HOST IO INDEX << 2. On GF119+, some engines stopped using the indexed 
accesses. On those, host mmio address falcon base + X is directed to falcon IO space address X. HOST IO INDEX 
is specified in the host-only MMIO register falcon base + Oxffc: 


MMIO бхїїс: HOST IO INDEX bits 0-5: selects bits 2-7 of the falcon IO space when accessed from host. 


Unaligned accesses to the IO space are unsupported, both from host and falcon. Low 2 bits of addresses should be 0 
at all times. 


Todo: document v4 new addressing 


Common ІО register list 


Host | Falcon Present оп | Name Description 
0х000 | 0x00000 | all units INTR, SET trigger interrupt 
0x004 | 0x00100 | all units INTR, CLEAR clear interrupt 


Continued on next page 
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Table 18 — continued from previous page 


Host | Falcon Present оп | Name Description 

0х008 | 0x00200 | all units INTR interrupt status 

0х00с | 0x00300 | v3+ units INTR MODE interrupt edge/level 

0х010 | 0x00400 | all units INTR, ЕМ SET interrupt enable set 

0x014 | 0x00500 | all units INTR. EN. CLR interrupt enable clear 

0х018 | 0x00600 | all units INTR. EN interrupt enable status 

OxOlc | 0x00700 | ай units INTR DISPATCH interrupt routing 

0x020 | 0x00800 | all units PERIODIC PERIOD periodic timer period 

0x024 | 0x00900 | all units PERIODIC TIME periodic timer counter 

0х028 | 0х00а00 | all units PERIODIC ENABLE periodic interrupt enable 

0х02с | 0х00500 | all units TIME LOW PTIMER time low 

0x030 | 0х00с00 | all units TIME HIGH PTIMER time high 

0x034 | 0x00d00 | all units WATCHDOG. TIME watchdog timer counter 

0х038 | 0х00е00 | all units WATCHDOG ENABLE watchdog interrupt enable 

0х040 | 0x01000 | all units SCRATCHO scratch register 

0х044 | 0x01100 | all units SCRATCHI scratch register 

0х048 | 0x01200 | all units FIFO ENABLE PFIFO access enable 

0х04с | 0x01300 | all units STATUS busy/idle status [falcon/io.txt] 

0x050 | 0x01400 | all units CHANNEL CUR current PFIFO channel 

0х054 | 0x01500 | all units CHANNEL NEXT next PFIFO channel 

0х058 | 0x01600 | all units CHANNEL CMD PFIFO channel control 

Ox05c | 0x01700 | all units STATUS MASK busy/idle status mask? [falcon/io.txt] 
0x060 | 0x01800 | all units УМ SUPERVISOR 22? 

0x064 | 0x01900 | ай units FIFO DATA FIFO command data 

0x068 | 0х01а00 | all units FIFO_CMD FIFO command 

0х06с | 0х01600 | v4+ units FIFO_DATA_WR FIFO command data write 

0x070 | 0х01с00 | all units FIFO_OCCUPIED FIFO commands available 

0х074 | 0х01400 | all units FIFO_ACK FIFO command ack 

0х078 | 0х01е00 | all units FIFO_LIMIT FIFO size 

0х07с | Ox01f00 | all units SUBENGINE_RESET reset subengines [falcon/io.txt] 
0х080 | 0x02000 | all units SCRATCH2 scratch register 

0х084 | 0x02100 | all units SCRATCH3 scratch register 

0х088 | 0x02200 | all units PM_TRIGGER perfmon triggers 

0х08с | 0x02300 | all units PM_MODE perfmon signal mode 

0х090 | 0x02400 | all units 22? 22? 

0х094 | 0x02500 | v3+ units 22? 22? 

0х098 | 0x02600 | v3+ units BREAKPOINT[0] code breakpoint 

0х09с | 0x02700 | v3+ units BREAKPOINTT[1] code breakpoint 

0x0a0 | 0х02800 | у3-- units ??? 22? 

Ох0а4 | 0x02900 | v3+ units ENG. CONTROL 22? 

0х0а8 | 0х02а00 | v4+ units РМ SEL perfmon signal select [falcon/perf.txt] 
Охбас | 0x02b00 | v4+ units HOST IO INDEX IO space index for host [falcon/io.txt] [XXX: doc] 
OxObO | 0х02с00 | v5+ units ??? more breakpoints? 

OxOb4 | 0х02400 | v5+ units ??? more breakpoints? 

OxOb8 | 0х02е00 | v5+ units 200 more breakpoints? 

0х100 | 0x04000 | all units UC CTRL microprocessor control [falcon/proc.txt] 
0х104 | 0х04100 | all units UC ENTRY microcode entry point [falcon/proc.txt] 
0х108 | 0x04200 | all units UC CAPS microprocessor caps [falcon/proc.txt] 
0х10с | 0x04300 | all units UC BLOCK ON FIFO microprocessor block [falcon/proc.txt] 
0x110 | 0x04400 | all units XFER EXT BASE xfer external base 
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Table 18 — continued from previous page 


Host | Falcon Present оп | Name Description 

Ox114 | 0x04500 | all units ХЕЕК FALCON ADDR xfer falcon address 

0x118 | 0x04600 | all units ХЕЕК CTRL xfer control 

0х11с | 0x04700 | all units XFER EXT ADDR xfer external offset 

0x120 | 0x04800 | all units XFER, STATUS xfer status 

0х124 | 0x04900 | crypto units CX STATUS crypt xfer status [falcon/crypt.txt] 

0x128 | 0x04a00 | v3+ units UC STATUS microprocessor status [falcon/proc.txt] 

0х12с | 0х04500 | v3+ units UC CAPS2 microprocessor caps [falcon/proc.txt] 

0х130 | 0х04с00 | v5+ units UC CTRL ALIAS microprocessor control [falcon/proc.txt] 

0x134 | 0х04400 | v5+ units 992 22? 

0х140 | 0х05000 | v3+ units TLB. CMD code VM command 

0х144 | 0x05100 | v3+ units ТІІВ СМ RES code VM command result 

0x148 | 0x05200 | v4+ units BRANCH HISTORY CTRL | ??? 

Ox14c | 0x05300 | v4+ units BRANCH HISTORY PC 22? 

0x150 | 0х05400 | UNK31 units | 222 22? 

0x154 | 0x05500 | UNK31 units | 222 22? 

0x158 | 0x05600 | UNK31 units | 222 ??? 

0х160 | 0х05800 | UAS units UAS IO WINDOW UAS I[] space window [falcon/data.txt] 

0х164 | 0x05900 | UAS units UAS CONFIG UAS configuration [falcon/data.txt] 

0x168 | 0х05400 | UAS units UAS FAULT ADDR UAS MMIO fault address [falcon/data.txt] 

0х16с | Ox05b00 | UAS units UAS БАОТТ STATUS UAS MMIO fault status [falcon/data.txt] 

0x174 | 0х05400 | v5+ units 992 22? 

0x178 | 0х05е00 | v5+ units 22? 22? 

0х17с | 0х05100 | v5+ units 22? 992 

0х180 | 0х06000 | v3+ units CODE INDEX code access window addr 

0х184 | 0х06100 | v3+ units CODE code access window 

0x188 | 0x06200 | v3+ units CODE VIRT ADDR code access virt addr 

0х1с0 | 0x07000 | v3+ units DATA. INDEX[0] data access window addr 

Ox1c4 | 0x07100 | v3+ units DATA[0] data access window 

Ox1c8 | 0x07200 | v3+ units DATA. INDEX[1] data access window addr 

0х1сс | 0x07300 | v3+ units DATA[1] data access window 

0х140 | 0x07400 | v3+ units DATA. INDEX[2] data access window addr 

Ox1d4 | 0x07500 | v3+ units DATA[2] data access window 

0х148 | 0x07600 | v3+ units DATA. INDEX[3] data access window addr 

0х14с | 0х07700 | v3+ units DATA[3] data access window 

0х1е0 | 0x07800 | v3+ units DATA. INDEX[4] data access window addr 

Oxle4 | 0x07900 | v3+ units DATA[4] data access window 

Oxle8 | 0х07а00 | v3+ units DATA. INDEX[5] data access window addr 

Oxlec | 0x07b00 | v3+ units DATA[5] data access window 

Ox1f0 | 0x07c00 | v3+ units DATA. INDEX[6] data access window addr 

Ox1f4 | 0х07400 | v3+ units DATA[6] data access window 

Ox1f8 | 0x07e00 | v3+ units DATA. INDEX[7] data access window addr 

Oxlfc | OxO7fO0 | v3+ units DATA[7] data access window 

0х200 | 0х08000 | v4+ units DEBUG_CMD debuging command [falcon/debug.txt] 

0х204 | 0х08100 | v4+ units DEBUG_ADDR address for DEBUG_CMD [falcon/debug.txt] 

0х208 | 0х08200 | v4+ units DEBUG_DATA_WR debug data to write [falcon/debug.txt] 

0х20с | 0х08300 | v4+ units DEBUG_DATA_RD debug data last read [falcon/debug.txt] 

0х240 | 0x09000 | v5+ units 22? 22? 

Oxfe8 | - GF100- v3 РМ SEL perfmon signal select [falcon/perf.txt] 

Oxfec | - vO, v3 UC_SP microprocessor $sp reg [falcon/proc.txt] 
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Table 18 — continued from previous page 


Host | Falcon Present оп | Name Description 

OxffO |- vO, v3 UC_PC microprocessor $рс reg [falcon/proc.txt] 
Oxff4 | - vO, v3 UPLOAD old code/data upload 

Oxff8 - v0, v3 UPLOAD_ADDR old code/data up addr 

Oxffc - vO, v3 HOST_IO_INDEX IO space index for host [falcon/io.txt] 


Todo: list incomplete for v4 


Registers starting from 0х400/0х 10000 аге engine-specific and described in engine documentation. 


Scratch registers 


ММТО 0х040 / I[0x01000]: SCRATCHO 
MMIO 0x044 / I[0x01100]: 5СБАТСН1 
MMIO 0x080 / I[0x02000]: SCRATCH2 
MMIO 0x084 / I[0x02100]: SCRATCH3 


Scratch 32-bit registers, meant for host <-> falcon communication. 


Engine status and control registers 


MMIO 0x04c / 110х013001: STATUS Status of various parts of the engine. For each bit, 1 means busy, 0 means 
idle. bit 0: UC. Microcode. 1 if microcode is running and not on a sleep insn. bit 1: ??? Further bits are 
engine-specific. 


MMIO 0x05c / Ц0х01700|: STATUS MASK А bitmask of nonexistent status bits. Each of bits 0-15 is set to 0 if 
corresponding STATUS line is tied to anything in this particular engine, 1 if it's unused. [?] 


Todo: clean. fix. write. move. 


MMIO 0x07c / I[0x01£00]: SUBENGINE RESET When written with value 1, resets all subengines that this falcon 
engine controls - that is, everything in IO space addresses 0x10000:0x20000. Note that this includes the memory 
interface - using this register while an xfer is in progress is ill-advised. 


v0 code/data upload registers 


ММПО Oxff4: UPLOAD The data to upload, see below 


MMIO Oxff8: UPLOAD ADDR bits 2-15: bits 2-15 of the code/data address being uploaded. bit 20: target segment. 
0 means data, 1 means code. bit 21: readback. bit 24: xfer busy [RO] bit 28: secret flag - secret engines only 
[see falcon/crypt.txt] bit 29: code busy [RO] 


This pair of registers can be used on VO to read/write code and data segments. It's quite fragile and should only be 
used when no xfers are active. bit 24 of UPLOAD. ADDR is set when this is the case. On v3+, this pair is broken and 
should be avoided in favor of the new-style access via CODE and DATA ports. 


To write data, poke address to UPLOAD. ADDR, then poke the data words to UPLOAD. The address will auto- 
increment as words are uploaded. 


To read data or code, poke address + readback flag to UPLOAD. ADDR, then read the word from UPLOAD. This 
only works for a single word, and you need to poke UPLOAD ADDR again for each subsequent word. 
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The code segment is organised in Ox100-byte pages. On secretful engines, each page can be secret or not. Reading 
from secret pages doesn't work and you just get 0. Writing code segment can only be done in aligned page units. 


To write a code page, write start address of the page + secret flag [if needed] to UPLOAD ADDR, then poke multiple 
of 0x40 words to UPLOAD. The address will autoincrement. The process cannot be interrupted except between pages. 


The “code busy” flag in UPLOAD. ADDR will be lit when this is the case. 


IO space writes: iowr, iowrs 


Writes a word to IO space. iowr does asynchronous writes [queues the write, but doesn't wait for completion], iowrs 
does synchronous write [write is guaranteed to complete before executing next instruction]. On vO cards, iowrs doesn't 


exist and synchronisation can instead be done by re-reading the relevant register. 


Instructions: 
Name | Description Present on | Subopcode 
iowr Asynchronous IO space write | all units 0 
iowrs Synchronous IO space write v3+ units 1 
Instruction class: unsized 
Operands: BASE, IDX, SRC 
Forms: 
Form Subopcode 
R2, 18, КІ | dO 
R2,0,R1 | fa 
Immediates: zero-extended 
Operation: 
if (op == iowr) 
IOWR(BASE + IDX » 4, SRC); 
else 
IOWRS (BASE + IDX * 4, SRC); 
ІО space reads: iord 
Reads a word from IO space. 
Instructions: 
Name | Description | Present on | Subopcode 
22? 27? v3+ units e 
iord IO space read | all units f 


Instruction class: unsized 


Operands: DST, BASE, IDX 


Forms: 
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Form Subopcode 
R1,R2,18 | c0 
R3,R2,RI | ff 


Immediates: zero-extended 


Operation: 
if (op == iord) 
DST = IORD(BASE + IDX x 4); 
else 
927% 


Todo: subope 


2.10.11 Timers 


Contents 


e Timers 
— Introduction 


— Periodic timer 


- Watchdog timer 


Introduction 
Time and timer-related registers are the same on all falcon engines, except PGRAPH CTXCTLs which lack РТІМЕК 
access. 
You can: 
* Read PTIMER's clock 
* Use a periodic timer: Generate an interrupt periodically 
* Use a watchdog/one-shot timer: Generate an interrupt once in the future 


Also note that the CTXCTLs have another watchdog timer on their own - see ../graph/gf100-ctxctl/intro.txt for more 
information. 


Periodic timer 
АП falcon engines have a periodic timer. This timer generates periodic interrupts on interrupt line. The registers 
controlling this timer are: 


MMIO 0x020 / 110х00800|: PERIODIC PERIOD A 32-bit register defining the period of the periodic timer, minus 
1. 


ММОПО 0х024 / I[0x00900]: PERIODIC TIME А 32-bit counter storing the time remaining before the tick. 
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MMIO 0x028 / I[0x00a00]: PERIODIC ENABLE bit 0: Enable the periodic timer. If 0, the counter doesn't 
change and no interrupts are generated. 


When the counter is enabled, PERIODIC TIME decreases by 1 every clock cycle. When PERIODIC TIME reaches 
0, an interrupt is generated on line 0 and the counter is reset to PERIODIC PERIOD. 


Operation (after each falcon core clock tick): 


if (PERIODIC ENABLE) ( 
if (PERIODIC, TIME == 0) ( 
PERIODIC TIME - PERIODIC PERIOD; 
intr line[0] - 1; 
) else ( 
PERIODIC TIME--; 
intr line[0] = 0; 
} 
) else ( 
intr line[0] = 0; 
} 
= PTIMER access = 


The falcon engines other than PGRAPH’s СТХСТІ 5 һауе РТЇМЕК 5 time registers aliased into their IO space. aliases 
are: 


MMIO 0x02c / I[0x00b00]: TIME LOW Alias of PTIMER’s TIME LOW register [MMIO 0x9400] 
MMIO 0x030 / I[0x00c00]: TIME HIGH Alias of PTIMER’s TIME HIGH register [MMIO 0x9410] 


Both of these registers are read-only. See ptimer for more information about PTIMER. 


Watchdog timer 

Apart from a periodic timer, the falcon engines also have an independent one-shot timer, also called watchdog timer. 
It can be used to set up a single interrupt in near future. The registers are: 

MMIO 0x034 / [0х00400]: WATCHDOG TIME А 32-bit counter storing the time remaining before the interrupt. 


MMIO 0х038 / I[0x00e00]: WATCHDOG ENABLE bit 0: Enable the watchdog timer. If 0, the counter doesn't 
change and no interrupts are generated. 


A classic use of a watchdog is to set it before calling a sensitive function by initializing it to, for instance, twice the 
usual time needed by this function to be executed. 


In falcon's case, the watchdog doesn't reboot the uc. Indeed, it is very similar to the periodic timer. The differences 
are: 


* it generates an interrupt on line 1 instead of 0. 
* it needs to be reset manually 


Operation (after each falcon core clock tick): 


if (WATCHDOG ENABLE) { 


if (WATCHDOG TIME == 0) ( 
intr line[1] = 1; 
) else ( 
WATCHDOG TIME--; 
intr line[1] = 0; 
} 
) else ( 
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(continued from previous page) 


intr line[1] = 0; 


2.10.12 Performance monitoring signals 


Contents 


* Performance monitoring signals 
- Introduction 


- Main PCOUNTER signals 


— User signals 


Todo: write me 


Introduction 


Todo: write me 


Main PCOUNTER signals 


The main signals exported by falcon to PCOUNTER are: 


Todo: docs & RE, please 


* 0x00: SLEEPING 
e 0x01: 222 fifo idle? 
* 0x02: IDLE 

* 0x03: ??? 

* 0x04: ??? 

* 0x05: TA 

* 0x06: 22? 

* 0x07: 22? 

* 0x08: ??? 

* 0x09: ??? 

e 0x0a: 222 


• OxOb: 222 
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e Ох0с: PM. TRIGGER 
* 0x0d: WRCACHE FLUSH 
e 0х0е-0х13: USER 


User signals 
MMIO 0x088 / 1[0х02200]: РМ TRIGGER A WO “trigger” register for various things. write 1 to a bit to trigger 
the relevant event, 0 to do nothing. 
* bits 0-5: ??? [perf counters?] 
* bit 16: WRCACHE FLUSH 
* bit 17: ??? [PM_TRIGGER?] 
MMIO 0x08c / I[0x02300]: РМ MODE bits 0-5: 222 [perf counters?] 


Todo: write me 


2.10.13 Debugging 


Contents 


* Debugging 


— Breakpoints 


Todo: write me 


Breakpoints 


Todo: write me 


2.10.14 FIFO interface 


Contents 


* FIFO interface 
- Introduction 
— PFIFO access control 
- Method FIFO 


— Channel switching 
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Todo: write me 


Introduction 


Todo: write me 


PFIFO access control 


Todo: write me 


Method FIFO 


Todo: write me 


Channel switching 


Todo: write me 


2.10.15 Memory interface 


Contents 


* Memory interface 


Introduction 


IO Registers 


Error interrupts 


Breakpoints 


Busy status 


Todo: write me 


2.10. falcon microprocessor 


371 


nVidia Hardware Documentation, Release git 


Introduction 


Todo: write me 


ІО Registers 


Todo: write me 


Error interrupts 


Todo: write me 


Breakpoints 


Todo: write me 


Busy status 


Todo: write me 


2.10.16 Cryptographic coprocessor 


Contents 


* Cryptographic coprocessor 


Introduction 


IO registers 


Interrupts 


Submitting crypto commands: ccmd 


Code authentication control 


Crypto xfer control 


Todo: write me 
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Introduction 


Todo: write me 


ІО registers 


Todo: write me 


Interrupts 


Todo: write me 


Submitting crypto commands: ccmd 


Todo: write me 


Code authentication control 


Todo: write me 


Crypto xfer control 


Todo: write me 


2.11 Video decoding, encoding, and processing 


Contents: 


2.11.1 VPE video decoding and encoding 


Contents: 
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PMPEG: MPEG1/MPEG2 video decoding engine 


Contents 


e PMPEG: MPEGI/MPEG2 video decoding engine 
- Introduction 


— MMIO registers 


- Interrupts 


Todo: write me 


Introduction 


Todo: write me 


MMIO registers 


Todo: write me 


Interrupts 


Todo: write me 


PME: motion estimation 


Contents: 


PVP1: video processor 


Contents: 


Scalar unit 


Contents 


* Scalar unit 
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- Introduction 
* Scalar registers 
* Scalar to vector data bus 
— Instruction format 
* Opcodes 
- Bad opcodes 
* Source mangling 
— Instructions 
* Load immediate: mov 
* Set high bits: sethi 
* Move to/from other register file: mov 
* Arithmetic operations: mul, min, max, abs, neg, add, sub, shr, sar 
* Bit operations: bitop 
* Bit operations with immediate: and, or, xor 
* Simple bytewise operations: bmin, bmax, babs, bneg, badd, bsub 
ж Bytewise bit operations: band, bor, bxor 
ж Bytewise bit shift operations: bshr, bsar 
* Bytewise multiplication: bmul 
* Send immediate to vector unit: vec 
* Send mask to vector unit and shift: vecms 


ж Send bytes to vector unit: bvec 


ж Bytewise multiply, add, and send to vector unit: bvecmad, bvecmadsel 


Introduction 


The scalar unit is one of the four execution units of VP1. It is used for general-purpose arithmetic. 


Scalar registers 


The scalar unit has 31 GPRs, $r0-$r30. They are 32 bits wide, and are usually used as 32-bit integers, but there are 
also SIMD instructions treating them as arrays of 4 bytes. In such cases, array notation is used to denote the individual 
bytes. Bits 0-7 are considered to be $rX[0], bits 8-15 are $rX[1] and so on. $r31 is a special register hardwired 
to 0. 


There are also 8 bits in each $c register belonging to the scalar unit. Most scalar instructions can (if requested) set 
these bits according to the computation result. The bits are: 


* bit 0: sign flag - set equal to bit 31 of the result 
* bit 1: zero flag - set if the result is 0 


* bit 2: b19 flag - set equal to bit 19 of the result 
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bit 3: b20 difference flag - set if bit 20 of the result is different from bit 20 of the first source 


bit 4: b20 flag - set equal to bit 20 of the result 


bit 5: b21 flag - set equal to bit 21 of the result 
bit 6: alt b19 flag (G80 only) - set equal to bit 19 of the result 
bit 7: b18 flag (G80 only) - set equal to bit 18 of the result 


The purpose of the last 6 bits is so far unknown. 


Scalar to vector data bus 


In addition to performing computations of its own, the scalar unit is also used in tandem with the vector unit to perform 
complex instructions. Certain scalar opcodes expose data on so-called s2v path (scalar to vector data bus), and certain 
vector opcodes consume this data. 


The data is ephemeral and only exists during the execution of a single bundle - the producing and consuming instruc- 
tions must be located in the same bundle. If a consuming instruction is used without a producing instruction, it'll read 
junk. If a producing instruction is used without a consuming instruction, the data is discarded. 


The s2v data consists of: 
* 4 signed 10-bits factors, used for multiplication 


* Svc selection and transformation, for use as mask input in vector unit, made of: 


valid flag: 1 if 82у data was emitted by proper s2v-emitting instruction (if false, vector unit will use an 
alternate source not involving s2v) 


— 2-bit $vc register index 


1-bit zero flag or sign flag selection (selects which half of $vc will be used) 


3-bit transform mode: used to mangle the $vc value before use as mask 


The factors can alternatively be treated as two 16-bit masks by some instructions. In that case, mask О consists of bits 
1-8 of factor 0, then bits 1-8 of factor 1 and mask 1 likewise consists of bits 1-8 of factors 2 and 3: 


s2v.mask[0] = (s2v.factor[0] >> 1 & Oxff) | (s2v.factor[1] >> 1 & Oxff) << 8 
s2v.mask[1] = (s2v.factor[2] >> 1 & Oxff) | (s2v.factor[3] >> 1 6 Oxff) << 8 


The $vc based mask is derived as follows: 


def xfrm(val, tab): 
res = 0 
for idx in гапде (16): 
# bit x of result is set if bit tab[x] of input is set 
if val & 1 «« tab[idx]: 
res |= 1 << idx 
return res 


val = Svc[s2v.vcsel.idx] 

# val2 is only used for transform mode 7 
val2 = Svc[s2v.vcsel.idx | 1] 

if s2v.vcsel.flag == 'sf': 


val = val & Oxffff 
val2 = val2 & Oxffff 
else: # 'zf' 
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(continued from previous page) 


val = val >> 16 & Oxffff 


val2 = val2 >> 16 & Oxffff 


if s2v.vcsel.xfrm -- 
# passthrough 


elif s2v.vcsel. 


elif 


elif 


- 


elif 


< 


14]) 
s2v.vcsel. 


131) 
s2v.vcsel. 


121) 
s2v.vcsel. 


15]) 
s2v.vcsel. 


141) 
s2v.vcsel. 


13]) 


S2v.vcmas 


S2v.vcmas 


S2v.vcmas 


S2v.vcmas 


S2v.vcmas 


S2v.vcmas 


S2v.vcmas 


K 


K 


K 


K 


K 


K 


K 


= val 
xfrm зэ 1: 
= xfrm(val, 


elif s2v.vcsel.xfrm -- 


# mode 7 is special: 


s2v.vcmask 
-22, 24, 26, 


= xfrm(val | 
28, 30]) 


1, 


1, 


1, 


1, 


[0, 


5, 


2, 


5, 


it uses two Svc inputs and 


val2 «« 16, 4, 


5, 


takes every 
10, 


6, 


5, 


8, 


10, 


9, 


10, 


9, 


зесопа 
12, 


10, 


12, 


9, 


14, 


10, 


9, 


bit 
16, 


12; 13; 


12, 12; 


I3, 13, 


12, 12; 


тал 


Instruction format 


The instruction word fields used in scalar instructions are: 


bits 0-2: CDST - if « 4, index of the $c register to set according to the instruction's result. 


indication that $c is not to be written (nVidia appears to use 7 in such case). 


bits 0-7: BIMMBAD - an immediate field used only in bad opcodes 


bits 0-18: IMM19 -a signed 19-bit immediate field used only by the mov instruction 


bits 0-15: IMM16 - a 16-bit immediate field used only by the sethi instruction 


bits 1-9: КАСТОК1 - a 9-bit signed immediate used as vector factor 


bits 10-18: FACTOR2 - a 9-bit signed immediate used as vector factor 


bit 1: SIGN2 - determines if byte multiplication source 2 is signed 


— 0: u- unsigned 


– l: s- signed 


bit 2: SIGNI - likewise for source 1 


Otherwise, an 


bits 3-10: BIMM: an 8-bit immediate for bytewise operations, signed or unsigned depending on instruction. 


bits 3-13: ІММ: signed 13-bit immediate. 


bits 3-6: BITOP: selects the bit operation to perform 
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e bits 3-7: RF ILE: selects the other register file for mov to/from other register file 
* bits 3-4: COND - if source mangling is used, the $c register index to use for source mangling. 
* bits 5-8: SLCT - if source mangling is used, the condition to use for source mangling. 
* bit 8: RND - determines byte multiplication rounding behaviour 
- 0: га - round down 
— 1: гп - round to nearest, ties rounding up 
e btis 9-13: SRC2 - the second source $r register, often mangled via source mangling. 


e bits 9-13 (low 5 bits) and bit O (high bit): BIMMMUL - a 6-bit immediate for bytewise multiplication, signed or 
unsigned depending on instruction. 


e bits 14-18: SRC1 - the first source $r register. 
e bits 19-23: DST - the destination $r register. 
* bits 19-20: VCIDX - the $vc register index for s2v 
e bit 21: VCFLAG - the $vc flag selection for s2v: 
– 0: sf 
- l: zf 


* bits 22-23 (low part) and 0 (high part): VCXFRM - the $vc transformation for 52у 


bits 24-31: OP - the opcode. 


Opcodes 


The opcode range assigned to the scalar unit is 0x00—-0x7 f£. The opcodes аге: 
* 0x01, 0x11, 0x21, 0x31: bytewise multiplication: bmul 
* 0x02, 0x12, 0x22, 0x32: bytewise multiplication: bmul (bad opcode) 
* 0x04: s2v multiply/add/send: bvecmad 
* 0x24: s2v immediate send: vec 
* 0x05: s2v multiply/add/select/send: bvecmadsel 
* 0x25: bytewise immediate and: band 
* 0x26: bytewise immediate or: bor 
* 0x27: bytewise immediate xor: bxor 
* 0x08, 0x18, 0x28, 0x38: bytewise minimum: bmin 
* 0x09, 0x19, 0x29, 0x39: bytewise maximum: bmax 
e 0x0a, 0x1a, 0x2a, 0x3a: bytewise absolute value: babs 
e 0хОр, 0x1b, Ox2b, Ox3b: bytewise negate: bneg 
• 0хОс, 0х1с, 0x2c, 0x3c: bytewise addition: badd 


* 0х04, 0х14, 0x2d, 0x3d: bytewise substract: bsub 


• 0x0e, 0х1е, 0x2e, 0x3e: bytewise shift: bshr, bsar 


* ОхОҒ: s2v send: bvec 
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* 0x41, 0x51, 0x61, 0x71: 16-bit multiplication: mul 
* 0x42: bitwise operation: bitop 

* 0x62: immediate and: and 

* 0x63: immediate xor: xor 

* 0x64: immediate or: or 

* 0x45: s2v 4-bit mask send and shift: vecms 
* 0x65: load immediate: mov 

* 0x75: set high bits immediate: sethi 

* 0x6a: mov to other register file: mov 

* 0x6b: mov from other register file: mov 

* 0x48, 0x58, 0x68, 0x78: minimum: min 
* 0x49, 0x59, 0x69, 0x79: maximum: тах 
* 0х4а, 0x5a, Ох7а: absolute value: abs 

* Ox4b, 0x5b, 0x7b: negation: neg 


* 0х4с, 0х5с, 0x6c, 0х7с: addition: add 


* 0x4d, 0х54, 0х64, 0x7d: substraction: sub 


• 0х4е, 0x5e, 0x6e, 0х7е: shift: shr, sar 


e ОхАҒ: the canonical scalar nop opcode 


Todo: some unused opcodes clear $c, some don't 


Bad opcodes 


Some of the VP1 instructions look like they're either buggy or just unintended artifacts of incomplete decoding hard- 
ware. These are known as bad opcodes and are characterised by using colliding bitfields. It's probably a bad idea to 
use them, but they do seem to reliably perform as documented here. 


Source mangling 


Some instructions perform source mangling: the source register(s) they use are not taken directly from a register index 
bitfield in the instruction. Instead, the register index from the instruction is... “adjusted” before use. There are several 
algorithms used for source mangling, most of them used only in a single instruction. 


The most common one, known as SRC2S, takes the register index from SRC2 field, a $c register index from COND, 
and $c bit index from SLCT. If SLCT is anything other than 4, the selected bit is extracted from $c and XORed into 
the lowest bit of the register index to use. Otherwise (SLCT is 4), bits 4-5 of $c are extracted, and added to bits 0-1 of 
the register index, discarding overflow out of bit 1: 


if SLCT == 4: 

adjust = Sc[COND] >> 4 & 3 

SRC2S = (SRC2 & ~3) | ((SRC2 + adjust) & 3) 
else: 
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Sc[COND] >> SLCT & 1 
SRC2 ^ adjust 


adjust - 
SRC2S = 


Instructions 


Load immediate: mov 


Loads a 19-bit signed immediate to the selected register. If you need to load a const that doesn't fit into 19 signed bits, 


use this instruction along with sethi. 


Instructions: 
Instruction | Operands Opcode 
mov Sr[DST] IMM19 | 0x65 
Operation: 
$r[DST] = IMM19 


Set high bits: sethi 


Loads a 16-bit immediate to high bits of the selected register. Low 16 bits are unaffected. 


Instructions: 
Instruction | Operands Opcode 
sethi $r[DST] 1ММ16 | 0x75 
Operation: 
$r[DST] = ($r[DST] & Oxffff) | ІММІ6 << 16 


Move to/from other register file: mov 


Does what it says on the tin. There is $c output capability, but it always outputs 0. The other register file is selected 


by RFILE field, and the possibilities are: 
* 0: $v word 0 (ie. bytes 0-3) 
* ]: $v word 1 (bytes 4-7) 

: $v word 2 (bytes 8-11) 

: $v word 3 (bytes 12-15) 

: 22? (МУ41:680 only) 

: 22? (NV41:G80 only) 

: 22? (МУ41:680 only) 

: 22? (МУ41:680 only) 


м о t A U N 
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e 8: $sr 

* 9: $mi 

* 10: Suc 

* 11: $1 (indices over 3 are ignored on writes, wrapped modulo 4 on reads) 
* 12: $a 

e 13: $c - read only (indices over 3 read as 0) 

* 18: curiously enough, aliases 2, for writes only 

e 20: $m [0-31] 

* 21: $m[32- 63] 

* 22: 5а (indices over 7 are wrapped modulo 8) (G80 only) 
e 23: $f (indices over | are wrapped modulo 2) 


* 24: $x (indices over 15 are wrapped modulo 16) (G80 only) 


Todo: figure out the pre-G80 register files 


Attempts to read or write unknown register file are ignored. In case of reads, the destination register is left unmodified. 


Instructions: 
Instruction | Operands Opcode 
mov [Sc[CDST]] $<RFILE>[DST] $r[SRC1] | 0хба 
mov [Sc[CDST]] Sr[DST] $<RFILE>[SRC1] 0x6b 
Operation: 
if opcode == 0хба: 
$<RFILE>[DST] = $r[SRC1] 
else: 
Sr [DST] = $<RFILE>[SRC1] 


if CDST < 4: 
Sc[CDST].scalar = 0 


Arithmetic operations: mul, min, max, abs, neg, add, sub, shr, sar 


mul performs a 16x16 multiplication with 32 bit result. shr and sar do a bitwise shift right by given amount, with 
negative amounts interpreted as left shift (and the shift amount limitted to -0x1f..0x1f). The other operations do 
what it says on the tin. abs, min, max, mul, sar treat the inputs as signed, shr as unsigned, for others it doesn't 
matter. 


The first source comes from a register selected by SRC1, and the second comes from either a register selected by 
mangled field SRC2S or a 13-bit signed immediate IMM. In case of abs and neg, the second source is unused, and 
the immediate versions are redundant (and in fact one set of opcodes is used for mov to/from other register file instead). 


Most of these operations have duplicate opcodes. The canonical one is the lowest one. 


АП of these operations set the full set of scalar condition codes. 


2.11. Video decoding, encoding, and processing 381 


nVidia Hardware Documentation, Release git 


Instructions: 
Instruction | Operands Opcode 
mul Sc[CDST $r[DS]1 Sr[SRC1] 5:158С25 0x41, 0x51 
min Sc[CDST $r[DS]1 S$r[SRC1] Sr[SRC2S 0x48, 0x58 
max Sc[CDST Sr [DST Sr[SRC1] Sr[SRC2S 0x49, 0x59 
abs Sc[CDST Sr [DST Sr[SRC1 0х4а, Ox5a, 0х7а 
neg Sc[CDST Sr [DST] 5:158С1 Ox4b, Ox5b, Ox7b 
add Sc[CDST $r[DS]1 Sr[SRC1] Sr[SRC2S Ox4c, 0х5с 
sub Sc[CDST $r[DS]1 5:(58С11 Sr[SRC2S 0х44, 0х54 
ваг Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S 0х4е 
shr Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S Ox5e 
mul Sc[CDST $r[DS]1 Sr[SRC1] I 0x61, 0x71 
min Sc[CDST Sr [DST Sr[SRC1] I 0x68, 0x78 
max Sc[CDST $r[DS]1 SE[SRGI],.I 0x69, 0x79 
add Sc[CDST $r[DS]1 Sr[SRC1] I 0х6с, 0х7с 
sub Sc[CDST Sr[DST] Sr[SRC1] I 0х64, 0х74 
sar Sc[CDST Sr[DST] Sr[SRC1] I Ox6e 
shr Sc[CDST Sr [DS] Sr[SRC1] I 0х7е 
Орегайоп: 
51 = sext(S$r[SRC1], 31) 
if opcode & 0x20: 
52 = sext (IMM, 12) 
else: 
52 = sext(Sr[SRC2], 31) 
if op == 'mul': 
res = sext(sl, 15) * sext(s2, 15) 
elif op == 'min': 
res = min(sl, s2) 
elif op == шах": 
res = тах(51, 52) 
elif op == 'abs': 
res = арз(51) 
elif op == 'neg': 
res = -s1 
elif ор == "ада": 
res = sl + s2 
elif op == 'sub': 
res = sl - 52 
elif ор зэ 'shr' or op == заг": 
shr/sar are unsigned/signed versions of the same insn 
if op == 'shr': 
sl &= Oxffffffff 
shift amount is 6-bit signed number 
shift = sext(s2, 5) 
and -0x20 is invalid 
if shift == -0x20: 
shift = 0 
negative shifts mean a left shift 
rf ӘНЕС 04 
res = 51 << -shift 
е1зе: 
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# sign of sl matters here 
res = sl >> shift 


Sr[DST] = res 
# build Sc result 
cres = 0 
if res & 1 << 31: 
cres |= 1 
if res == 0: 
cres |= 2 
if res & 1 << 19: 
cres |= 4 
if (res ^ s1) & 1 «« 20: 
cres |- 8 
if res & 1 «« 20: 
cres |= 0x10 
if res & 1 << 21: 
cres |= 0x20 
if variant == 'G80': 
if res & 1 << 19: 
eres |= 0x40 
if res & 1 << 18: 
cres |= 0x80 
if CDST < 4: 
Sc[CDST].scalar = cres 


Bit operations: bitop 


Performs an arbitrary two-input bit operation on two registers, selected by SRC1 and SRC2. $c output works, but 


only with a subset of flags. 


Instructions: 
Instruction | Operands Opcode 
bitop ВІТОР [$c[CDST]] $r[DST] $r[SRC1] $r[SRC2] | 0x42 
Operation: 
51 = $r[SRC1] 


52 = $r[SRC2] 


res = bitop(BITOP, 52, 51) & Oxffffffff 


Sr[DST] = res 
# build $c result 
cres = 0 
# bit 0 not set 
if res == 
cres |= 2 
if res 8 1 << 19: 
cres |= 4 


# bit 3 not set 
if res & 1 << 20: 
cres |= 0x10 
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if res & 1 «« 21: 
cres |= 0x20 

if varian 'G80': 
if res & 1 «« 19: 

|= 0x40 

1 << 18: 

|= 0x80 


cres 
if res & 
cres 

if CDST < 4: 
Sc[CDST].scalar = 


cres 


Bit operations with immediate: and, or, xor 


Performs a given bitwise operation on a register and 13-bit immediate. Like for bitop, $c output only works partially. 


Instructions: 


Instruction 


Operands 


Opcode 


and 


[$c [CDST] ] 


Sr[DST] Sr[SRC1] IMM | 0x62 


xor 


[$c[CDST] ] 


Sr[DST] $r[SRC1] IMM | 0x63 


or 


[$c [CDST] ] 


Sr[DST] Sr[SRC1] IMM | 0x64 


Operation: 


sl = Sr[S 
if op == 
res = 
elif op = 
res = 
elif op = 
res = 


Sr[DST] = 
# build 5 
cres = 0 
# bit On 
if res 
cres 
if res & 
cres 
# bit 3 п 
if res 6 
cres 
if res & 
cres 


RC1] 


'and': 
51 & IMM 
= 'xor': 
51 ^ IMM 
= 'or': 
51 | IMM 


res 
c result 


ot set 


- 2 

| 4194 
- 4 

ot set 

1 << 20: 
= 0x10 
1 << 21: 
= 0x20 


== 'G80': 
1 «« 19: 
|= 0x40 
1 << 18: 


|= 0x80 


if variant 
if res & 
cres 

if res & 
cres 

if CDST < 4: 


Sc[CDST].scalar = cres 
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Simple bytewise operations: bmin, bmax, babs, bneg, badd, bsub 


Those perform the corresponding operation (minumum, maximum, absolute value, negation, addition, substraction) 
in SIMD manner on 8-bit signed or unsigned numbers from one or two sources. Source 1 is always a register selected 
by SRC1 bitfield. Source 2, if it is used (ie. instruction is not babs nor bneg), is either a register selected by SRC2S 
mangled bitfield, or immediate taken from BIMM bitfield. 


Each of these instructions comes in signed and unsigned variants and both perform result clipping. Note that abs is 
rather uninteresting in its unsigned variant (it's just the identity function), and so is neg (result is always 0 or clipped 
to 0. 


These instruction have a $c output, but it’s always set to all-0 if used. 


Also note that babs and bneg have two redundant opcodes each: the bit that normally selects immediate or register 
second source doesn't apply to them. 


Instructions: 
Instruction | Operands Opcode 
bmin s Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S 0x08 
bmax 5 SC[CDST Sr[DST] Sr[SRC1] Sr[SRC2S 0x09 
babs s Sc[CDST Sr[DST] Sr[SRC1 0x0a 
bneg s Sc[CDST Sr[DST] 5:15В8С1 0x0b 
badd s Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S 0х0с 
bsub s Sc[CDST Sr[DST] S$r[SRC1] Sr[SRC2S 0х04 
bmin u Sc[CDST Sr [DST] Sr[SRC1] Sr[SRC2S 0x18 
bmax u ЗЄ|СРЭТ Sr[DST] Sr[SRC1] Sr[SRC2S 0х19 
рарз u Sc[CDST Sr[DST] 5:15В8С1 0х1а 
bneg u Sc[CDST Sr[DST] Sr[SRC1 0х1Ю 
badd и Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S Oxic 
bsub u Sc[CDST Sr[DST] Sr[SRC1] Sr[SRC2S 0х14 
bmin s Sc[CDST Sr [DST] Sr[SRC1] BI 0x28 
bmax s Sc[CDST Sr [DST] Sr[SRC1] BI 0x29 
babs s Sc[CDST Sr[DST] Sr[SRC1 0x2a 
bneg s Sc[CDST Sr[DST] $r[SRC1 0x2b 
badd s Sc[CDST Sr[DST] Sr[SRC1] BI 0х2с 
bsub s Se[CDST Sr[DST] Sr[SRC1] BI 0x2d 
bmin u Sc[CDST Sr [DST] Sr[SRC1] BI 0x38 
bmax u Sc[CDST Sr[DST] Sr[SRC1] BI 0x39 
babs u Sc[CDST Sr[DST] Sr[SRC1 0x3a 
bneg u Sc[CDST Sr[DST] Sr[SRC1 0x3b 
badd u SC[CDST Sr[DST] Sr[SRC1] BI 0x3c 
bsub u Sc[CDST Sr[DST] Sr[SRC1] BI 0x3d 
Operation: 


for idx in range(4): 
51 = Sr[SRC1] [idx] 
if opcode & 0x20: 
s2 = BIMM 
else: 
s2 = Sr[SRC2S] [idx] 


if opcode & 0x10: 
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# unsigned 


51 6- Oxff 
S2 6- Oxff 
else: 
Ї signed 
sl = sext(sl, 7) 
s2 = sext(s2, 7) 
if op == 'bmin': 
res = min(sl, 52) 
elif ор == 'bmax': 
res = тах (51, 52) 
elif op == 'babs': 
res = арѕ (51) 
elif op == 'bneg': 
res = -s1 
elif op == 'badd': 
res = 51 + 52 
elif op == 'bsub': 
res = sl - 52 


if opcode & 0x10: 
# unsigned: clip to 0..0xff 
if res < 0: 


res = 0 
if res > Oxff: 
res = Oxff 


else: 
# signed: clip to -0x80..0x7f 
if res « -0x80: 


res = -0x80 
if res » Ox7f: 
res = Ox7f 


Sr[DST] [idx] = res 


if CDST < 4: 
Sc[CDST].scalar = 0 


Bytewise bit operations: band, bor, bxor 


Performs a given bitwise operation on a register and an 8-bit immediate replicated 4 times. Or, intepreted differently, 
performs such operation on every byte of a register idependently. $c output is present, but always outputs 0. 


Instructions: 
Instruction | Operands Opcode 
and [$c[CDST]] $r[DST] $r[SRC1] BIMM | 0x25 
or [$c[CDST]] $r[DST] $r[SRC1] BIMM | 0x26 
xor [$c[CDST]] $r[DST] 5:158С11 BIMM | 0x27 
Operation: 
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for idx in range(4): 


if op == 'and': 

Sr[DST] [idx] = $r[SRC1] [idx] & BIMM 
elif op == 'or': 

Sr [DST] [idx] = $r[SRC1] [idx] | BIMM 
elif op == 'xor': 

$r[DST] [idx] = $r[SRC1] [idx] ^ BIMM 

if CDST < 4: 

Sc[CDST].scalar = 0 


Bytewise bit shift operations: bshr, bsar 


Performs a bytewise SIMD right shift. Like the usual shift instruction, the shift amount is considered signed and 
negative amounts result in left shift. In this case, the shift amount is a 4-bit signed number. Operands are as in usual 


bytewise operations. 


Instructions: 
Instruction | Operands Opcode 
bsar [$C[CDST]] S$r[DST] $r[SRC1] S$r[SRC2S] Ox0e 
bshr [$C[CDST]] S$r[DST] $r[SRC1] S$r[SRC2S] Oxle 
bsar [$Cc[CDST]] Sr[DST] $r[SRC1] BIMM 0х2е 
bshr [$c[CDST]] Sr[DST] $r[SRC1] BIMM 0x3e 
Operation: 


for idx in range(4): 
51 $r[SRC1] [idx] 
if opcode & 0x20: 
s2 BIMM 
else: 
s2 


Sr[SRC2S] [idx] 


if opcode & 0x10: 


unsigned 
sl &= Oxff 
else: 
signed 
sl = sext (51, 7) 
shift = sext (s2, 3) 
it shift < 0: 
res = 51 << -shift 
е1зе: 
гез = 51 >> shift 


Sr [DST] [idx] res 


if CDSI < 4: 
Sc[CDST].scalar 


2.11. Video decoding, encoding, and processing 


387 


nVidia Hardware Documentation, Release git 


Bytewise multiplication: bmul 


These instructions perform bytewise fractional multiplication: the inputs and outputs are considered to be fixed-point 
numbers with 8 fractional bits (unsigned version) or 7 fractional bits (signed version). The signedness of both inputs 
and the output can be controlled independently (the signedness of the output is controlled by the opcode, and of the 
inputs by instruction word flags SIGN1 and SIGN2). The results are clipped to the output range. There are two 


rounding modes: round down and round to nearest with ties rounded up. 


The first source is always a register selected by SRC1 bitfield. The second source can be a register selected by SRC2 


bitfield, or 6-bit immediate іп BIMMMUL bitfield padded with two zero bits on the right. 


Note that besides proper 0xX1 opcodes, there are also ОхХ2 bad opcodes. In case of register-register ops, these 


opcodes are just aliases of the sane ones, but for immediate opcodes, a colliding bitfield is used. 


The instructions have no $c output capability. 


Instructions: 
Instruction | Operands Opcode 
bmul s RND $r[DST] SIGN1 Sr[SRC1] SIGN2 Sr[SRC2] 0х01, 0x02 
bmul u RND $r[DST] SIGN1 Sr[SRC1] SIGN2 Sr[SRC2] 0x11, 0x12 
bmul s RND $r[DST] 5ІСМІ $r[SRC1] SIGN2 BIMMMUL 0x21 
bmul u RND $r[DST] 5ІСМІ $r[SRC1] SIGN2 BIMMMUL 0x31 
bmul s RND Sr[DST] SIGNI 5:(58С1| SIGN2 BIMMBAD 0x22 (bad opcode) 
bmul u RND $r[DST] 5ІСМІ $r[SRC1] SIGN2 BIMMBAD 0x32 (bad opcode) 
Operation: 
for idx in range(4): 


# read inputs 
51 = Sr[SRC1] [idx] 
if opcode & 0x20: 
if opcode & 2: 
s2 = BIMMBAD 
else: 
s2 = BIMMMUL << 2 
else: 
52 = Sr[SRC2S] [idx] 


# convert inputs to 8 fractional bits - unsigned inputs are already ok 
if SIGNI: 

581 = sext(ssl, 7) «« 1 
if SIGN2: 

SS2 = sext(ss2, 7) «« 1 


# multiply - the result has 16 fractional bits 
res = ssl x 552 


if opcode & 0x10: 
# unsigned result 
# first, if round to nearest is selected, apply rounding correction 
if RND == 'rn': 
res += 0x80 
# convert to 8 fractional bits 
res >>= 8 
# clip 
if res « 0: 
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res = 0 
if res » Oxff: 
res = Oxff 
else: 
# signed result 
if RND == 'rn': 
res += 0x100 
# convert to 7 fractional bits 
res >>- 9 


# clip 
if res « -0x80: 
res = -0x80 
if res » Ox7f: 
res = Ox7f 
Sr[DST] [idx] = res 


Send immediate to vector unit: vec 


This instruction takes two 9-bit immediate operands and sends them as factors to the vector unit. The first immediate 


is used as factors О and 1, and the second is used as factors 2 and 3. Svc selection is sent as well. 


Instructions: 


Instruction | Operands Opcode 


vec FACTOR1 FACTOR2 $vc[VCIDX] VCFLAG VCXFRM | 0x24 


Operation: 


s2v.factor[0] s2v.factor[1] FACTOR1 
s2v.factor[2] = s2v.factor[3] = FACTOR2 
S2v.vcsel.idx - VCIDX 

s2v.vcsel.flag - VCFLAG 

S2v.vcsel.xfrm = VCXFRM 


Send mask to vector unit and shift: vecms 


This instruction shifts a register right by 4 bits and uses the bits shifted out as s2v mask 0 after expansion (each bit is 
replicated 4 times). The s2v factors are derived from that mask and are not very useful. The right shift is sign-filling. 


Svc selection is sent as well. 


Instructions: 


Instruction | Operands Opcode 
vecms Sr[SRC1] Svc[VCIDX] VCFLAG VCXFRM | 0x45 


Operation: 


val = sext ($r[SRC1], 31) 
Sr[SRC1] = val >> 4 
# the factors are made so that the mask derived from them will contain 
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# each bit from the short mask repeated 4 times 
ҒО = 0 
fl = 0 
if val & 1: 

ҒО |= Oxie 
if val & 2: 

ҒО |= 0х1е0 
if val & 4: 

fl |= Oxle 
if val & 8: 

f1 |= 0х1е0 
s2v.factor[0] = £0 
s2v.factor[1] = fl 
s2v.factor[2] = s2v.factor[3] = 0 
s2v.vcsel.idx = VCIDX 
s2v.vcsel.flag = VCFLAG 
s2v.vcsel.xfrm = VCXFRM 


Send bytes to vector unit: bvec 


Treats a register as 4-byte vector, sends the bytes as s2v factors (treating them as signed with 7 fractional bits). $vc 
selection is sent as well. If the s2v output is used as masks, this effectively takes mask 0 from source bits 0-15 and 
mask 1 from source bits 16-31. 


Instructions: 


Instruction | Operands Opcode 
bvec 5:158С1| $vc[VCIDX] VCFLAG VCXFRM | 0хОЁ 


Operation: 


for idx in range(4): 
s2v.factor[idx] = sext($r[SRCl][idx], 7) << 1 
S2v.vcsel.idx = VCIDX 
s2v.vcsel.flag - VCFLAG 
S2v.vcsel.xfrm VCXFRM 


Bytewise multiply, add, and send to vector unit: bvecmad, bvecmadsel 


Figure out this one yourself. It sends s2v factors based on SIMD multiply & add, uses weird source mangling, and 
even weirder source 1 bitfields. 


Instructions: 
Instruction Operands Opcode 
bvecmad $r[SRC1] $r[SRC2]q $vc[VCIDX] VCFLAG VCXFRM | 0x04 
bvecmadsel | $r[SRC1] $r[SRC2]q $vc[VCIDX] VCFLAG VCXFRM | 0x05 


Operation: 
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if SLCT-- 

adjust = Sc[COND] >> 4 & 3 
else: 

adjust = $c[COND] >> SICT & 1 


# 58С1 selects the pre-factor, which will be multiplied by source 3 


if op == 'bvecmad': 
prefactor = Sr[SRCl] >> 11 & Oxff 
elif op == 'bvecmadsel': 


prefactor = 5:(55С1| >> 11 & Ox7f 


52а = Sr[SRC2 | adjust] 
s2b Sr[SRC2 | 2 | adjust] 


for idx in range(4): 
# this time source is mangled by OR, not ХОК - don't ask me 


if op == 'bvecmad' 
midx = idx 
elif op == 'bvecmadsel': 
midx = idx & 2 
if SLCT == 2 and Sc[COND] & 0x80: 
midx |= 1 


baseline (res will have 16 fractional bits, sources have 8) 
res = s2a[midx] << 8 
throw in the multiplication result 


res += prefactor х s2b[idx] 

and rounding correction (for round to nearest, ties up) 
res += 0x40 

and round to 9 fractional bits 
s2v.factor[idx] = res >> 7 


s2v.vcsel.idx = VCIDX 
s2v.vcsel.flag = VCFLAG 
S2v.vcsel.xfrm = VCXFRM 


Vector unit 


Contents 


* Vector unit 
- Introduction 
* Vector registers 
— Instruction format 
* Opcodes 
— Multiplication, accumulation, and rounding 


— Instructions 


* Move: mov 
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* Move immediate: vmov 

* Move from $vc: mov 

* Swizzle: vswz 

* Simple arithmetic operations: vmin, vmax, vabs, vneg, vadd, vsub 
* Clip to range: vclip 

* Minimum of absolute values: vminabs 

* Add 9-bit: vadd9 

* Compare with absolute difference: vcmpad 

ж Bit operations: vbitop 

ж Bit operations with immediate: vand, vor, vxor 

ж Shift operations: vshr, уяаг 

* Linear interpolation: vlrp 

* Multiply and multiply with accumulate: vmul, vmac 
* Dual multiply and add/accumulate: vmac2, vmad2 
ж Dual linear interpolation: vlrp2 

* Quad linear interpolation, part 1: vlrpda 


* Factor linear interpolation: vlrpf 


* Quad linear interpolation, part 2: vlrp4b 


Introduction 


The vector unit is one of the four execution units of VP1. It operates in SIMD manner on 16-element vectors. 


Vector registers 


The vector unit has 32 vector registers, 5270-5731. They are 128 bits wide and are treated as 16 components of 8 bits 
each. Depending on element, they can be treated as signed or unsigned. 


There are also 4 vector condition code registers, 57с0-5ус3. They are like $c for vector registers - each of them 
has 16 "sign flag" and 16 "zero flag" bits, one of each per vector component. When read as a 32-word, bits 0-15 are 
the sign flags and bits 16-31 are the zero flags. 


Further, the vector unit has a singular 448-bit vector accumulator register, $va. It is made of 16 components, each of 
them a 28-bit signed number with 16 fractional bits. It's used to store intermediate unrounded results of multiply-add 
computations. 


Finally, there's an extra 128-bit register, $vx, which works quite like the usual $v registers. It's only read by virp4b 
instructions and written only by special load to vector extra register instructions. The reasons for its existence are 
unclear. 


Instruction format 


The instruction word fields used in vector instructions in addition to the ones used in scalar instructions are: 
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bit 0: S2VMODE - selects how 52у data is used: 


- 0: factor - s2v data is interpreted as factors 
— 1: mask - s2v data is interpreted as masks 


bits 0-2: VCDST - if < 4, index of $vc register to set according to the instruction’s results. Otherwise, an 
indication that $vc is not to be written (the canonical value for such case appears to be 7). 


bits 0-1: VCSRC - selects $vc input for vlrp2 


bit 2: VCSEL - the $vc flag selection for vlrp2: 


- 0: sf 
- l: zf 
bit 3: SWZLOHI - selects how the swizzle selectors are decoded: 
- 0: lo - bits 0-3 are component selector, bit 4 is source selector 
— 1: hi - bits 4-7 are component selector, bit 0 is source selector 
bit 3: FRACTINT - selects whether the multiplication is considered to be integer or fixed-point: 
- 0: £ract: fixed-point 
- l: int: integer 
bit 4: HILO - selects which part of multiplication result to read: 
— 0: hi: high part 
- 1: 1o: low part 
bits 5-7: SHIFT - a 3-bit signed immediate, used as an extra right shift factor 
bits 4-8: SRC3 - the third source $v register. 
bit 9: ALTRND - like RND, but for different instructions 
bit 9: SIGNS - determines if double-interpolation input is signed 
— 0: u- unsigned 
- l: s- signed 
bit 10: LRP2X - determines if base input is XORed with 0x80 for v1rp2. 


bit 11: VAWRITE - determines if $va is written for vlrp2. 


bits 11-13: ALTSHIFT - a 3-bit signed immediate, used as an extra right shift factor 
bit 12: SIGND - determines if double-interpolation output is signed 

— 0: u- unsigned 

- l: s- signed 


bits 19-22: СМРОР: selects the bit operation to perform on comparison result and previous flag value 


Opcodes 


The opcode range assigned to the vector unit is 0х80-0хоҒ. The opcodes are: 


0x80, Оха0, 0xb0, 0x81, 0x91, Оха1, Oxb1: multiplication: vmul 


0x90: linear interpolation: vlrp 
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* 0x82, 0x92, 0xa2, 0xb2, 0x83, 0x93, 0xa3: multiplication with accumulation: vmac 
* 0x84, 0x85, 0x95: dual multiplication with accumulation: vmac2 
* 0x86, 0x87, 0x97: dual multiplication with addition: vmad2 

* 0x96, 0xa6, Оха7: dual multiplication with addition: vmad2 (bad opcode) 
* 0x94: bitwise operation: vbitop 

* Оха4: clip to range: vclip 

e 0xa5: minimum of absolute values: vminabs 

e 0xb3: dual linear interpolation: vlrp2 

e 0х54: quad linear interpolation, part 1: vlrp4a 

• 0xb5: factor linear interpolation: vlrpf 

e 0xb6, 0xb7: quad linear interpolation, part 2: vlrp4b 

* 0x88, 0x98, Оха8, 0хр8: minimum: vmin 

e 0x89, 0x99, Оха9, 0хр9: maximum: утах 

* 0x8a, Ox9a: absolute value: vabs 

* 0xaa: immediate and: vand 

* ОхБа: move: mov 

• 0x8b: negation: vneg 

e Ox9b: swizzle: vswz 

* Oxab: immediate xor: vxor 

* Oxbb: move from $vc: mov 

* 0x8c, 0х9с, Oxac, Oxbc: addition: vadd 

* 0x8d, Ox9d, Oxbd: substraction: vsub 


* Oxad: move immediate: vmov 


* 0x8e, 0x9e, Охае, Oxbe: shift: vshr, узат 

* 0х8Ғ: compare with absolute difference: vcmpad 
• 0х9Ғ: add 9-bit: vadd9 

* Oxaf: immediate or: vor 


* Oxbf: the canonical vector nop opcode 


Multiplication, accumulation, and rounding 


The most advanced vector instructions involve multiplication and the vector accumulator. The vector unit has two 
multipliers (signed 10-bit * 10-bit -> signed 20-bit) and three wide adders (performing 28-bit addition): the first two 
add the multiplication results, and the third adds a rounding correction. In other words, it can compute A + (B * C << 
S) + (D* E << S) +R, where A is 28-bit input, B, C, D, E are signed 10-bit inputs, S is either 0 or 8, and R is the 
rounding correction, determined from the readout parameters. The B, C, D, E inputs can in turn be computed from 
other inputs using one of the narrower ALUs. 


The A input can come from the vector accumulator, be fixed to 0, or come from a vector register component shifted 
by some shift amount. The shift amount, if used, is the inverse of the shift amount used by the readout process. 
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There are three things that can happen to the result of the multiply-accumulate calculations: 
* written in its entirety to the vector accumulator 
* shifted, rounded, clipped, and written to a vector register 
* both of the above 
The vector register readout process takes the following parameters: 
* sign: whether the result should be unsigned or signed 


* fract/int selection: if int, the multiplication is considered to be done on integers, and the 16-bit result is at bits 
8-23 of the value added to the accumulator (ie. S is 8). Otherwise, the multiplication is performed as if the 
inputs were fractions (unsigned with 8 fractional bits, signed with 7), and the results are aligned so that bits 
16-27 of the accumulator are integer part, and 0-15 are fractional part. 


* hi/lo selection: selects whether high or low 8 bits of the results are read. For integers, the result is treated as 
16-bit integer. For fractions, the high part is either an unsigned fixed-point number with 8 fractional bits, or a 
signed number with 7 fractional bits, and the low part is always 8 bits lower than the high part. 


* aright shift, in range of -4..3: the result is shifted right by that amount before readout (as usual, negative means 
left shift). 


* rounding mode: either round down, or round to nearest. If round to nearest is selected, a configuration bit in 
Succfg register selects if ties are rounded up or down (to accomodate video codecs which switch that on frame 


basis). 


First, any inputs from vector registers are read, converted as signed or unsigned integers, and normalized if needed: 


def mad input(val, fractint, isign): 
if isign == 'u': 
return val & Oxff 
else: 
if fractint -- 'int': 
return sext(val, 7) 
else: 
return sext(val, 7) «« 1 


The readout shift factor is determined as follows: 


def mad shift(fractint, sign, shift): 


if fractint -- 'int': 
return 16 - shift 

elif sign -- 'u': 
return 8 - shift 

elif sign -- 's': 


return 9 - shift 


If A is taken from a vector register, it's expanded as follows: 


def mad expand(val, fractint, sign, shift): 
return val << mad shift(fractint, sign, shift) 


The actual multiply-add process works like that: 


def mad(a, b, c, d, e, rnd, fractint, sign, shift, hilo): 
res =a 


if fractint == 'fract': 
res t= р ж с 4 4 хоё 


(continues on next page) 
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else: 
res += (р х с 44 хе) << 8 


# rounding correction 
if rnd == 'rn': 


# determine the final readout shift 
if hilo == "10": 

rshift = mad shift(fractint, sign, shift) - 8 
else: 

rshift = mad shift(fractint, sign, shift) 


# only add rounding correction if there's going to be an actual 
# right shift 
LE xshxft. >- 0: 
res += 1 << (rshift - 1) 
if Succfg.tiernd == 'down': 
res —= 1 


# the accumulator is only 28 bits long, and it wraps 
return sext(res, 27) 


And the readout process is: 


def mad read(val, fractint, sign, shift, hilo): 
# first, shift it to the position 
rshift mad shift(fractint, sign, shift) - 8 
if rshift >= 0: 
res = val >> rshift 
else: 
res = val << -rshift 


# second, clip to 16-bit signed or unsigned 
if sign == 'u': 
if res < 0: 


res 0 

if res » Oxffff: 
res Oxtffff 

else: 

if res « -0x8000: 
res -0х8000 

if res > Ox7fff: 
res O TEEL 


# finally, extract high/low part of the final result 
if hilo == 'hi': 

return res >> 8 & Oxff 
else: 

return res & Oxff 


Note that high/low selection, apart from actual result readout, also affects the rounding computation. This means that, 
if rounding is desired and the full 16-bit result is to be read, the low part should be read first with rounding (which 


will add the rounding correction to the accumulator) and then the high part should be read without rounding (since the 
rounding correction is already applied). 
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Instructions 
Move: mov 


Copies one register to another. $vc output supported for zero flag only. 


Instructions: 


Instruction | Operands Opcode 
mov [$vc[VCDST]] Sv[DST] Sv[SRC1] | 0xba 


Operation: 


for idx in range(16): 


Sv[DST][idx] = Sv[SRC1] [idx] 
if VCDST « 4: 
$vc [VCDST] .sf [idx] = 0 


Sve [VCDST] -2f [idx] $v[DST][idx] == 0 


Move immediate: vmov 


Loads an 8-bit immediate to each component of destination. $vc output is fully supported, with sign flag set to bit 7 
of the value. 


Instructions: 


Instruction | Operands Opcode 
vmov [$vc[VCDST]] $v[DST] BIMM | Oxad 


Operation: 


for idx in range(16): 
$v[DST] [idx] = BIMM 
if VCDST < 4: 
Svc[VCDST].sf [idx] BIMM >> 7 & 1 
$vc [VCDST] .zf [idx] = BIMM == 


Move from $vc: mov 


Reads the contents of all $vc registers to a selected vector register. Bytes 0-3 correspond to $vc0, bytes 4-7 to $vc1, 
and so on. The sign flags are in bytes 0-1, and the zero flags are in bytes 2-3. 


Instructions: 


Instruction | Operands Opcode 
mov Sv[DST] Svc | Oxbb 


Operation: 
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for idx in range(4): 
Sv[DST] [idx * 4] = $vc[idx].sf & Oxff; 
Sv[DST] [idx * 4 + 1] = $vc[idx].sf >> 8 8 Oxff; 
Sv[DST][idx х 4 + 2] = $vc[idx].zf & Oxff; 
Sv[DST] [idx * 4 + 3] = Svc[idx].zf >> 8 & Oxff; 


Swizzle: vswz 


Performs a swizzle, also known as a shuffle: builds a result vector from arbitrarily selected components of two input 
vectors. There are three source vectors: sources 1 and 2 supply the data to be used, while source 3 selects the mapping 
of output vector components to input vector components. Each component of source 3 consists of source selector 
and component selector. They select the source (1 or 2) and its component that will be used as the corresponding 
component of the result. 


Instructions: 


Instruction | Operands Opcode 
VSWZ SWZLOHI Sv[DST] Sv[SRC1] Sv[SRC2] Sv[SRC3] 0x9b 


Operation: 


for idx in range(16): 

# read the component and source selectors 

if SWZLOHI == '1o': 
comp = Sv[SRC3] [idx] & Oxf 
src = бу(5ЕСЗІ [idx] >> 4 & 1 

else: 
comp = Sv[SRC3] [idx] >> 4 & Oxf 
src = Sv[SRC3] [idx] & 1 


# read the source & component 
if src == 0: 
Sv [DST] [idx] 
else: 
Sv [DST] [idx] 


Sv [SRC1] [comp] 


Sv [SRC2] [comp] 


Simple arithmetic operations: vmin, vmax, vabs, vneg, vadd, vsub 


Those perform the corresponding operation (minumum, maximum, absolute value, negation, addition, substraction) 
in SIMD manner on 8-bit signed or unsigned numbers from one or two sources. Source | is always a register selected 
by SRC1 bitfield. Source 2, if it is used (ie. instruction is not vabs nor vneg), is either a register selected by SRC2 
bitfield, or immediate taken from BIMM bitfield. 


Most of these instructions come in signed and unsigned variants and both perform result clipping. The exception is 
vneg, which only has a signed version. Note that vabs is rather uninteresting in its unsigned variant (it’s just the 
identity function). Note that vsub lacks a signed version with immediat: it can be replaced with vadd with negated 
immediate. 


$vc output is fully supported. For signed variants, the sign flag output is the sign of the result. For unsigned variants, 
the sign flag is used as an overflow flag: it’s set if the true unclipped result is not in 0. . Ox £f range. 


Instructions: 
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Instruction | Operands Opcode 
vmin s $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0x88 
vmax s $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0x89 
vabs s Svc [VCDST Sv [DST] Sv[SRC1 0x8a 
vneg s $vc[VCDST Sv[DST] $v[SRC1 0х8р 
vadd s $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0х8с 
vsub s $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0х84 
vmin u $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0x98 
vmax u Svc [VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0x99 
vabs u Svc [VCDST Sv[DST] Sv[SRC1 0x9a 
vadd u $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0х9с 
vsub u $vc[VCDST Sv[DST] Sv[SRC1] Sv[SRC2 0х94 
vmin s $vc[VCDST Sv[DST] Sv[SRC1] BI 0ха8 
vmax 8 $vc[VCDST Sv[DST] Sv[SRC1] BI 0ха9 
vadd s Svc [VCDST Sv[DST] Sv[SRC1] BI Oxac 
vmin u Svc [VCDST Sv[DST] Sv[SRC1] BI Oxb8 
vmax u Svc [VCDST Sv[DST] Sv[SRC1] BI Oxb9 
vadd u Svc [VCDST Sv[DST] Sv[SRC1] BI Oxbe 
vsub u Svc [VCDST Sv[DST] Sv[SRC1] BI Oxbd 


clip to 0..0xff 


Operation: 
for idx in range(16): 
51 = Sv[SRC1] [idx] 
if opcode & 0x20: 
s2 = BIMM 
else: 
s2 = Sv[SRC2] [idx] 
if opcode & 0x10: 
# unsigned 
sl &= Oxff 
S2 &= Oxff 
else: 
# signed 
51 = sext(sl, 7) 
s2 = sext(s2, 7) 
if ор == 'vmin': 
res = шіп(51, 52) 
elif op == 'vmax': 
res = max(sl, s2) 
elif ор == 'vabs': 
res = арѕ (51) 
elif op == 'vneg': 
res = -s1 
elif ор == 'vadd': 
res = 51 + 52 
elif op == 'vsub': 
res = 51 - 52 
sf = 0 
if opcode & 0x10: 
# unsigned: 
if res < 0: 


(continues on next page) 
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res = 0 
sf = 1 

if res » Oxff: 
res = Oxff 
sf = 1 


# signed: clip to -0x80..0x7f 


if res < 0: 
sf = 1 
if res < -0x80: 
res = -0x80 
if res > Ox7f: 
res = 0x7f 
$v[DST] [idx] = res 


if VCDST < 4: 
$vc [VCDST] .sf [idx] = sf 
бүс(УСр85Т|.2Ё(14х| 


TOS == 


Clip to range: vclip 


Performs a SIMD range clipping operation: first source is the value to clip, second and third sources are the range 
endpoints. Or, equivalently, calculates the median of three inputs. $vc output is supported, with the sign flag set if 
clipping was performed (value equal to range endpoint is considered to be clipped) or the range is improper (second 
endpoint not larger than the first). АП inputs are treated as signed. 


Instructions: 


Instruction | Operands Opcode 
vclip [$vc[VCDST]] Sv[DST] Sv[SRC1] Sv[SRC2] Sv[SRC3] 0ха4 


Operation: 


for idx in range(16): 

sl = sext ($v[SRC1] [idx], 7) 
s2 = sext ($v[SRC2] [idx], 7) 
s3 sext (Sv[SRC3] [idx], 7) 


sf 


ll 
© 


# determine endpoints 
if s2 « s3: 
Ї proper order 
start = s2 
end = 53 
е1зе: 
# reverse order 
start = 53 
епа = 52 
sf = 1 


# and clip 
res = s1 


(continues on next page) 
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if res <= start: 


res — start 
sf = 1 

if res >= end: 
res = end 
sf = 1 


$v[DST] [idx] = res 
if VCDST < 4: 
$vc [VCDST] .sf [idx] = sf 
$vc [VCDST] .zf [idx] = 


Minimum of absolute values: vminabs 


Performs min (abs (a), abs (b) ). Both inputs are treated as signed. $vc output is supported for zero flag only. 
The result is clipped to 0. . 0x7 f range (which only matters if both inputs are -0x8 0). 


Instructions: 
Instruction | Operands Opcode 
vminabs [$vc[VCDST]] $v[DST] $v[SRC1] S$v[SRC2] 0ха5 
Operation: 
for idx in range(16): 
sl = sext ($v[SRC1] [idx], 7) 
s2 = sext ($v[SRC2] [idx], 7) 
res = min (abs (51, s2)) 
if res > 0x7f: 
res = 0x7f 
$v[DST] [idx] = res 
if VCDST < 4: 
$vc[VCDST].sf[idx] = 0 
$vc [VCDST] .zf [idx] = res == 0 


Add 9-bit: vadd9 


Performs an 8-bit unsigned + 9-bit signed addition (ie. exactly what's needed for motion compensation). The first 
source provides the 8-bit inputs, while the second and third are uniquely treated as vectors of 8 16-bit components (of 
which only low 9 are actually used). Second source provides components 0-7, and third provides 8-15. The result is 
unsigned and clipped. $vc output is supported, with sign flag set to 1 if the true result was out of 8-bit unsigned range. 


Instructions: 


Instruction 


Operands 


Opcode 


vadd9 


[$vc[VCDST]] 


Sv [DST] 


Sv [SRC1] 


Sv [SRC2] 


Sv [SRC3] 


Ox9f 
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Operation: 


for idx in range(16): 
# read source 1 
sl = Sv[SRC1] [idx] 


if idx < 8: 
# 0-7: SRC2 
521 = $v[SRC2] [idx х 2] 


s2h 
else: 

# 8-15: SRC3 

821 Sv[SRC3] [ (idx - 8) х 2] 

s2h Sv[SRC3] [ (idx - 8) х 2 + 1] 


Sv[SRC2] [idx х 2 + 1] 


read as 9-bit signed number 
52 = sext(s2h << 8 | 821, 8) 


add 
res = sl + 52 


clip 
sf = 0 
if res > Oxff: 
sf = 1 
res = Oxff 
if res < 0: 


sf = 1 
res = 0 
$v[DST] [idx] = res 


if VCDST < 4: 
$vc [VCDST] .sf [idx] = sf 
$vc [VCDST] .zf [idx] = res == 


Compare with absolute difference: vempad 


This instruction performs the following operations: 
e substract source 1.1 from source 2 
e take the absolute value of the difference 
e compare the result with source 1.2 


e if equal, set zero flag of selected $vc output 


* set sign flag of $vc output to an arbitrary bitwise operation of 82у $vc input and “less than" comparison result 


All inputs are treated as unsigned. If s2v scalar instruction is not used together with this instruction, $vc input defaults 


to sign flag of the $vc register selected as output, with no transformation. 


This instruction has two sources: source 1 is a register pair, while source 2 is a single register. The second register 
of the pair is selected by ORing 1 to the index of the first register of the pair. Source 2 is selected by mangled field 


SRC2S. 


Instructions: 
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Instruction | Operands Opcode 
vcmppad CMPOP [Svc[VCDST]] $v[SRC1]d $v[SRC2S] | 0х8Е 


Operation: 
if s2v.vcsel.valid: 
vcin = s2v.vcmask 
else: 


vein = Svc[VCDST в 3].sf 


for idx in range(16): 
ad = abs (S$v[SRC2S] [idx] - $v[SRC1] [idx] ) 
other = Sv[SRC1 | 1] [idx] 


if VCDST < 4: 
$vc [VCDST] .sf [idx] = sf 
Svc[VCDST] .zf [idx] ad == bitop(CMPOP, vcin >> idx & 1, ad < other) 


Bit operations: vbitop 


Performs an arbitrary two-input bit operation on two registers. $vc output supported for zero flag only. 


Instructions: 


Instruction | Operands Opcode 


vbitop ВІТОР [$vc[CDST]] $v[DST] Sv[SRC1] S$v[SRC2] 0x94 


Operation: 


for idx in range(16): 
51 = Sv[SRC1] [idx] 
s2 = Sv[SRC2] [idx] 


res = bitop(BITOP, s2, sl) & Oxff 


Sv[DST] [idx] = res 

if VCDST < 4: 
Svc[VCDST].sf[idx] = 0 
Svc[VCDST].zf[idx] = res == 0 


Bit operations with immediate: vand, vor, vxor 


Performs a given bitwise operation on a register and an 8-bit immediate replicated for each component. $vc output 


supported for zero flag only. 


Instructions: 
Instruction | Operands Opcode 
vand [Svc[VCDST]] Sv[DST] Sv[SRC1] BIMM | Охаа 
vxor [Svc[VCDST]] $v[DST] $v[SRC1] BIMM | Oxab 
vor [$vc (УСр5Т11 $v[DST] $v[SRC1] BIMM | Oxaf 
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Operation: 


for idx in range(16): 
51 = Sv[SRC1] [idx] 


Sv [DST] [idx] = 

if VCDST < 4: 
Svc[VCDST].sf 
Svc[VCDST]. 


о. 
ж 
ll 


Shift operations: vshr, vsar 


Performs a SIMD right shift, like the scalar bytewise shift instruction. $vc output is fully supported, with bit 7 of the 


result used as the sign flag. 


Instructions: 
Instruction | Operands Opcode 
vsar [$vc[VCDST]] Sv[DST] Sv[SRC1] Sv[SRC2] Ox8e 
vshr [$Svc[VCDST]] Sv[DST] Sv[SRC1] Sv[SRC2] Ох9е 
vsar [Svc[VCDST])] Sv[DST] Sv[SRC1] BIMM Oxae 
vshr [$vc[VCDST]] Sv[DST] Sv[SRC1] BIMM Oxbe 
Operation: 


for idx in range(16): 
51 = Sv[SRC1] [idx] 
if opcode & 0x20: 
s2 = BIMM 
else: 
s2 = Sv[SRC2] [idx] 
if opcode & 0x10: 
unsigned 
sl &= Oxff 
else: 
signed 
sl = sext(sl, 7) 


shift = sext(s2, 3) 


if shift < 0: 
res = sl << -shift 
else: 
res = sl >> shift 


Sv [DST] [idx] = res 


(continues on next page) 
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if VCDST « 4: 
$vc[VCDST].sf[idx] 
Svc[VCDST] .zf [idx] 


res >> 7 & 1 
res == 0 


Linear interpolation: virp 


A SIMD linear interpolation instruction. Takes two sources: a register pair containing the two values to interpolate, and 
a register containing the interpolation factor. The result is basically SRC1.1 ж (SRC2 >> SHIFT) + SRC1.2 
х (1 — (SRC2 >> SHIFT) ). All inputs are unsigned fractions. 


Instructions: 


Instruction | Operands Opcode 
virp RND SHIFT $v[DST] $v[SRC1]d $v[SRC2] | 0x90 


Operation: 


for idx in range(16): 
vall = $v[SRC1] [idx] 
val2 = Sv[SRC1 | 1] [idx] 
a = mad expand(val2, 'fract', 'u', SHIFT) 
res = mad(a, vall - val2, $v[SRC2] [idx], 0, 0, RND, 'fract', 'u', SHIFT, 'hi') 
$v[DST] [idx] = mad read(res, 'fract', 'u', SHIFT, 'hi') 


Multiply and multiply with accumulate: vmul, vmac 


Performs a simple multiplication of two sources (but with the full set of weird options available). The result is either 
added to the vector accumulator (утас) or replaces it (vmu1). The result can additionally be read to a vector register, 
but doesn't have to be. 


The instructions come in many variants: they can store the result in a vector register or not, have unsigned or signed 
output, and register or immediate second source. The set of available combinations is incomplete, however: while the 
$v-writing variants have all combinations available, there are no unsigned variants of register-register vmul with по 
$v write, nor unsigned register-immediate утас with no $v write. Also, unsigned register-immediate vmu1 with no 
$v output is a bad opcode. 


Instructions: 


2.11. Video decoding, encoding, and processing 405 


nVidia Hardware Documentation, Release git 


Instruc- | Operands Opcode 
tion 
vmul s | RND FRACTINT SHIFT HILO % SIGN1 $v[SRC1] SIGN2 0x80 
Sv[SRC2] 
vmul s | RND FRACTI SHIFT HILO # SIGNI $v[SRC1] SIGN2 0xa0 
BIMMMUL 
vmul u RND FRACTINT SHIE HILO # SIGNI $v[SRC1] SIGN2 OxbO (bad op- 
BIMMBAD code) 
vmul s RND FRACTIN SHIE HILO Sv[DST] 5ІСМІ $v[SRC1 0x81 
SIGN2 $v[SRC2] 
vmul u RND FRACTI SHIF HILO Sv[DST] SIGN1 $v[SRC1 0x91 
SIGN2 $v[SRC2] 
vmul s RND FRACTIN SHIF HILO $v[DST] SIGN1 $v[SRC1 Оха 
5ІСМ2 ВІМММІЛ, 
vmul u RND FRACTINT 5НЇЁ HILO Sv[DST] 5ІСМІ $v[SRC1 0хЫ1 
5ІСМ2 ВІМММІЛ, 
vmac s RND FRACTI SHIE HILO Sv[DST] 5ІСМІ $v[SRC1 0x82 
SIGN2 Sv[SRC2] 
vmac u RND FRACTIN SHIE HILO Sv[DST] 5ІСМІ $v[SRC1 0x92 
SIGN2 Sv[SRC2] 
vmac s RND FRACTIN SHIE HILO Sv[DST] 5ІСМІ $v[SRC1 0xa2 
SIGN2 BIMMMUL 
vmac u RND FRACTI SHIE HILO Sv[DST] 5ІСМІ $v[SRC1 Oxb2 
SIGN2 BIMMMUL 
vmac s | RND FRACTINT SHIFT HILO % SIGN1 5у(58С1| SIGN2 0x83 
Sv [SRC2] 
vmac u | RND FRACTINT SHIFT HILO % SIGN1 5у(58С1| SIGN2 0x93 
Sv [SRC2] 
vmac s | RND FRACTI SHIFT HILO # SIGNI $v[SRC1] SIGN2 0xa3 
BIMMMUL 
Operation: 
for idx in range(16): 
# read inputs 
51 = Sv[SRC1] [idx] 
if opcode & 0x20: 
if op == 0x30: 
s2 = BIMMBAD 
else: 
s2 = BIMMMUL << 2 
else: 
52 = Sv[SRC2] [idx] 
# convert inputs 
sl = mad_input (sl, FRACTINT, SIGN1) 
52 = mad input(s2, FRACTINT, SIGN2) 
# do the computation 
if ор == 'vmac': 
a = Sva[idx] 
else: 
а = 0 
гез = паа (а, 51, 52, 0, 0, RND, FRACTINT, op.sign, SHIFT, HILO) 


(continues on next page) 
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# write result 
Sva[idx] = res 
if DST is not None: 
Sv [DST] [idx] = mad read(res, FRACTINT, op.sign, SHIFT, HILO) 


Dual multiply and add/accumulate: vmac2, vmad2 


Performs two multiplications and adds the result to a given source or to the vector accumulator. The result is written 
to the vector accumulator and can also be written to a $v register. For each multiplication, one input is a register 
source, and the other is s2v factor. The register sources for the multiplications are a register pair. The s2v sources for 
the multiplications are either s2v factors (one factor from each pair is selected according to s2v $vc input) or 0/1 as 
decided by s2v mask. 


The instructions come in signed and unsigned variants. Apart from some bad opcodes (which overlay SRC3 with mad 
param fields), only $v writing versions have unsigned variants. 


Instructions: 

Instruc- | Operands Opcode 

tion 

vmad2 S2VMODE RND FRACTINT SHIFT HILO 48 51601 S$v[SRC1]d 0x84 

5 SIGN2 $v[SRC2] 

vmad2 S2VMODE RND FRACTINT SHIFT HILO $v[DST] SIGNI 0x85 

S Sv[SRC1]d SIGN2 $v[SRC2] 

vmad2 S2VMODE RND FRACTINT SHIFT HILO $v[DST] SIGNI 0x95 

u Sv[SRC1]d SIGN2 Sv[SRC2] 

vmac2 S2VMODE RND FRACTINT SHIFT HILO 48 SIGNI S$v[SRC1]d 0x86 

S 

vmac2 S2VMODE RND FRACTINT SHIFT HILO # 5ІСМІ S$v[SRC1] 0x96 (bad op- 

u Sv[SRC3 code) 

vmac2 S2VMODE RND FRACTINT SHIFT HILO 4 SIGN1 $v[SRC1] 0xa 6 (bad op- 

S Sv[SRC3 code) 

vmac2 S2VMODE RND FRACTINT SHIFT HILO $v[DST] 5ІСМІ 0x87 

5 $v[SRC1]d 

vmac2 S2VMODE RND FRACTINT SHIFT HILO $v[DST] SIGNI 0x97 

u Sv[SRCl]d 

vmac2 S2VMODE RND FRACTINT SHIFT HILO $v[DST] SIGNI 0xa" (bad op- 

S Sv[SRC1] Sv[SRC3 code) 
Operation: 


for idx in range(16): 

# read inputs 

511 = $v[SRC1][idx] 

if opcode in (0x96, Oxa6, Oxa7): 
# one of the bad opcodes 
512 = Sv[SRC3] [idx] 

else: 
512 = Sv[SRC1 | 1] [idx] 


52 = $v[SRC2] [idx] 


(continues on next page) 
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# convert inputs 
sll = mad_input (s11, FRACTINT, SIGN1) 
512 = mad input(s12, FRACTINT, SIGN1) 
52 = mad input(s2, FRACTINT, SIGN2) 
# prepare A value 
if ор == 'vmad2': 
a = mad_expand(s2, FRACTINT, sign, SHIFT) 
else: 
a = Sva[idx] 
# prepare factors 
if S2VMODE == 'mask': 
if s2v.mask[0] & 1 << idx: 
Ё1 = 0x100 
else: 
fl = 0 
if s2v.mask[1] & 1 << idx: 
f2 = 0x100 
else: 
f2 = 0 
else: 
# 'factor' 
cc = s2v.vcmask >> idx & 1 
fl = s2v.factor[0 | cc] 
f2 = s2v.factor[2 | cc] 
# do the operation 
res = mad(a, s11, f1, s12, f2, RND, FRACTINT, sign, SHIFT, HILO) 
# write result 
Sva[idx] = res 
if DST is not None: 
Sv [DST] [idx] = mad read(res, FRACTINT, op.sign, SHIFT, HILO) 


Dual linear interpolation: virp2 


This instruction performs the following steps: 


* read a quad register source selected by SRC1 


* rotate the source quad by the amount selected by bits 4-5 of a selected $c register 


* for each component: 


treat register 2 as value at (1, 0) 


treat register 3 as value at (0, 1) 


write result to $v register and optionally $va 


treat register 0 of the quad as function value at (0, 0) 


select a pair of factors from s2v input based on selected flag of selected Svc register 


treat the factors as a coordinate pair and interpolate function value at these coordinates 
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The inputs and outputs may be signed or unsigned. A shift and rounding mode can be selected. Additionally, there's 
an option to ХОК register 0 with 0x80 before use as the base value (but not for the differences used in interpolation). 
Don't ask me. 


Instructions: 


Instruc- | Operands Op- 
tion code 
vlrp2 | SIGND VAWRITE RND SHIFT $v[DST] SIGNS LRP2X $v[SRCl]q 0xb3 

$c[COND] $vc[VCSRC] VCSEL 


ІЗІ 


Operation: 


# a function selecting the factors 
def get lrp2 factors (idx): 
if VCSEL == 'sf': 
vcmask = S$vc[VCSRC].sf 
else: 
vcmask = $vc[VCSRC].zf 


cc 
EL 
£2 


vcmask >> idx 6 1; 
s2v.factor[0 | cc 
s2v.factor[2 | cc 


return Ғ1, f2 


# determine rotation 
rot = $c[COND] >> 4 & 3 


for idx in range(16): 
# read inputs, maybe do the xor 
510х = 510 = 5у|(58С1 & 0х1с) | ((58ВС1 + rot) 6 
512 = Sv[(SRCI & 0х1с) | ((5ЕСІ + rot + 2) & 3)] 
513 = 5 у|(58С1 & 0х1с) | ((5ЕСІ + rot + 3) & 3)] 
if LRP2X: 
510х “- 0x80 


# convert inputs if necessary 

510 = mad input(s10, 'fract', SIGNS) 
512 mad input(s12, 'fract', SIGNS) 
S13 = mad input(s13, 'fract', SIGNS) 
510х = mad іприё (510х, 'fract', SIGNS) 


# do it 

а = mad ехрапа (510х, 'fract', SIGND, SHIFT) 

fl, f2 = get lrp2 factors (idx) 

res = mad(a, s12 - s10, f1, s13 - s10, f2, RND, 'fract', SIGND, SHIFT, 'hi') 


# write outputs 
if VAWRITE: 
Sva[idx] = res 
Sv [DST] [idx] = mad read(res, 'fract', SIGND, SHIFT, 111!) 


Quad linear interpolation, part 1: virp4a 


Works like the previous variant, but only outputs to $va and lacks some flags. Both outputs and inputs are unsigned. 
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Instructions: 
Instruction | Operands Opcode 
virp4a RND SHIFT 4 $v[SRC1]q Sc[COND] $vc[VCSRC] VCSEL | Oxb4 
Operation: 
rot = Sc[COND] >> 4 & 3 
for idx in range(16): 
510 = $v[(SRC1 8 0х1с) | ((SRC1 + rot) & 3)] [idx] 
512 = 54у|(58С1 8 Oxlc) | ((SRC1 + rot + 2) & 3)] [idx] 
513 = бу |(5КСІ 8 0х1с) | ((SRC1 + rot + 3) 3) ] [idx] 
a = mad expand(s10, 'fract', 'u', SHIFT) 
fl, f2 = get lrp2 factors (idx) 
Sva[idx] = mad(a, s12 - s10, Ғ1, s13 - 510, f2, RND, 'fract', 'u', SHIFT, "10 


Factor linear interpolation: vlrpf 


Has similiar input processing to v1rp2, but instead uses source 1 registers 2 and 3 to interpolate s2v input. Result is 


SRC2 + SRC1.2 ж F1 + SRC1.3 х (F2 = F1). 


Instructions: 
Instruc- Operands Op- 
tion code 
vlrpf RND SHIFT # $v[SRC1]q $c[COND] 5у(58С21 $vc[VCSRC] 0xb5 
VCSEL 
Operation: 
rot = Sc[COND] >> 4 в 3 
for idx in range(16): 
512 = Sv[(SRC1 8 0х1с) | ((5ЕСІ + rot + 2) & 3)] [idx] 
513 = бу |(5КСІ 8 0х1с) | ((SRC1 + rot + 3) 33] [idx] 
S2 = sext ($v[SRC2] [idx], 7) 
a = mad expand(s2, 'fract', 'u', SHIFT) 
fl, £2 = get lrp2 factors (idx) 
Sva[idx] = mad(a, s12 - 513, Ғ1, s13, £2, RND, 'fract', 'u', SHIFT, 'lo') 


Quad linear interpolation, part 2: virp4b 


Can be used together with v1rp4a for quad linear interpolation. First s2v factor is the interpolation coefficient for 


register 1, and second factor is the interpolation coefficient for the extra register ($v x). 


Alternatively, can be coupled with virpf. 
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Instructions: 
Instruc- Operands Op- 
tion code 
virp4b ALTRND ALTSHIFT $v[DST] 5у(58С11а4 $c[COND] SLCT 0xb6 
u Svc[VCSRC] VCSE 
virp4b ALTRND ALTSHIFT $v[DST] $v[SRC1]q $c[COND] SLCT Oxb7 
5 Svc[VCSRC] VCSEL 

Operation: 


for idx in range(16): 


if SLCT == 

rot = Sc[COND] >> 4 & 3 

510 = Sv[(SRC1 & 0х1с) | ((SRC1 + rot) в 3)] [idx] 

511 = 54у((58С1 8 0х1с) | ((SRC1 + rot + 1) & 3)] [idx] 
else: 


adjust = Sc[COND] >> 51СТ 61 
510 = sll = $v[srcl ^ adjust] [idx] 


fl, £2 = get_lrp2_factors (idx) 


res = mad($va[idx], s11 - 510, fl, $vx[idx] - 510, f2, ALTRND, 'fract', op. 
sign, ALTSHIFT, 'hi') 


Ѕуа [іах] = res 
Sv [DST] [idx] = mad read(res, 'fract', op.sign, ALTSHIFT, 111!) 
Branch unit 


Contents 


* Branch unit 


— Introduction 


- Branch registers 


Todo: write me 


Introduction 


Todo: write me 


Branch registers 
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Todo: write me 


Address unit 


Contents 


* Address unit 

- Introduction 

— The data store 
* Address registers 

— Instruction format 
* Opcodes 

— Instructions 
* Set low/high bits: setlo, sethi 
* Addition: add 
* Bit operations: bitop 
ж Address addition: aadd 
ж Load: ldvh, ldvv, Ids 
ж Load and ааа: ldavh, ldavv, ldas 
* Store: stvh, stvv, sts 
ж Store and add: stavh, stavv, stas 
ж Load raw: ldr 


* Store raw and add: star 


ж Load extra апа add: ldaxh, ldaxv 


Introduction 


The address unit is one of the four execution units of УР1. It transfers data between that data store and registers, 
controls the DMA unit, and performs address calculations. 


The data store 


The data store is the working memory of VP1, 8kB in size. Data can be transferred between the data store and $r/Sv 
registers using load/store instructions, or between the data store and main memory using the DMA engine. It’s often 
treated as two-dimensional, with row stride selectable between 0x10, 0x20, 0x40, and 0x80 bytes: there are "load 
vertical" instructions which gather consecutive bytes vertically rather than horizontally. 


Because of its 2D capabilities, the data store is internally organized into 16 independently addressable 16-bit wide 
banks of 256 cells each, and the memory addresses are carefully spread between the banks so that both horizontal and 
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vertical loads from any address will require at most one access to every bank. The bank assignments differ between 
the supported strides, so row stride is basically a part of the address, and an area of memory always has to be accessed 
with the same stride (unless you don't care about its previous contents). Specifially, the translation of (address, stride) 
pair into (bank, cell index, high/low byte) is as follows: 


def address xlat(addr, stride): 
bank = addr & Oxf 
hilo = addr >> 4 & 1 
cell = addr >> 5 & Oxff 
if stride == 
# 0x10 bytes 
bank += (addr >> 5) & 7 
elif stride == 
# 0x20 bytes 
bank += addr >> 5 


elif stride -- 0x40: 
# 0x40 bytes 
bank += addr >> 6 
elif stride -- 0x80: 
# 0x80 bytes 
bank += addr >> 7 


bank 6- Oxf 
return bank, cell, hilo 


In pseudocode, data store bytes are denoted by DS | рапК, cell, hilo]. 


In case of vertical access with Ox10 bytes stride, all 16 bits of 8 banks will be used by a 16-byte access. In all other 
cases, 8 bits of all 16 banks will be used for such access. DMA transfers can make use of the full 256-bit width of the 
data store, by transmitting 0x20 consecutive bytes at a time. 


The data store can be accessed by load/store instructions in one of four ways: 


* horizontal: 16 consecutive naturally aligned addresses are used: 


def addresses horizontal(addr, stride): 
addr &= 0х1ҒҒ0 
return [address_xlat(addr | idx, stride) for idx in гапде (16) ] 


* vertical: 16 addresses separated by stride bytes are used, also naturally aligned: 


def addresses_vertical(addr, stride): 
addr &= Oxlfff 
# clear the bits used for y coord 
addr &= -(0хЇ << (4 + stride) ) 
return [address_xlat(addr | idx << (4 + stride)) for idx in гапде (16) ] 


* scalar: like horizontal, but 4 bytes: 


def addresses_horizontal_short (addr, stride): 
addr &= Oxlffc 
return [address_xlat(addr | idx, stride) for idx in range(4) ] 


* raw: the raw data store coordinates are provided directly 


Address registers 


The address unit has 32 address registers, 5а0-5а31. These are used for address storage. If they're used to store data 
store addresses (and not DMA command parameters), they have the following bitfields: 
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* bits 0-15: addr - the actual data store address 
e bits 16-29: limit -can store the high bounduary of an array, to assist in looping 
* bits 30-31: st ride - selects data store stride: 
— 0: 0x10 bytes 
— 1: 0x20 bytes 
— 2: 0x40 bytes 
— 3: 0x80 bytes 
There are also 3 bits in each $c register belonging to the address unit. They are: 
* bits 8-9: long address flags 
— bit 8: sign flag - set equal to bit 31 of the result 
— bit 9: zero flag - set if the result is O 
* bit 10: short address flag 
— bit 10: end flag - set if addr field of the result is greater than or equal to Limit 


Some address instructions set either the long or short flags of a given $c register according to the result. 


Instruction format 


The instruction word fields used in address instructions in addition to the ones used in scalar instructions are: 
• bit 0: for opcode 0xd7, selects the subopcode: 
- 0: load raw: ldr 
— 1: store raw and add: star 


* bits 3-13: UIMM: unsigned 13-bit immediate. 


Todo: list me 


Opcodes 


The opcode range assigned to the address unit is 0xc0-0xd£. The opcodes аге: 
* 0хс0: load vector horizontal and add: ldavh 
e 0хс1: load vector vertical and add: ldavv 
* 0xc2: load scalar and add: ldas 
e Oxc3: ??? (ха1а) 
* 0хс4: store vector horizontal and add: stavh 
* Oxc5: store vector vertical апа add: stavv 
e 0xc6: store scalar and ааа: stas 
* Oxc7: ??? (xdst) 


* 0xc8: load extra horizontal and add: ldaxh 
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* 0хс9: load extra vertical апа ааа: ldaxv 
* Охса: address addition: aadd 
* Oxcb: addition: add 
* Охсс: set low bits: setlo 
* Oxcd: set high bits: sethi 
e Oxce: ??? (xdbar) 
e Oxcf: ??? (xdwait) 
e 0xd0: load vector horizontal and add: ldavh 
e Oxdl: load vector vertical and add: ldavv 
e 0xd2: load scalar and ааа: ldas 
e 0xd3: bitwise operation: bitop 
* 0х44: store vector horizontal and add: stavh 
* 0xd5: store vector vertical and add: stavv 
* 0xd6: store scalar and add: stas 
* 0х47: depending on instruction bit О: 

- 0: load raw: ldr 

— 1: store raw and ааа: star 
e 0xd8: load vector horizontal: ldvh 
* 0xd9: load vector vertical: ldvv 
* Oxda: load scalar: Ids 
* Oxdb: 222 
e Охас: store vector horizontal: stvh 
* Oxdd: store vector vertical: stvv 
* Oxde: store scalar: sts 


* Oxdf: the canonical address nop opcode 


Todo: complete the list 


Instructions 
Set low/high bits: setlo, sethi 


Sets low or high 16 bits of a register to an immediate value. The other half is unaffected. 


Instructions: 
Instruction | Operands Opcode 
setlo Sa[DST] IMM16 | Oxcc 
sethi Sa[DST] IMM16 | Oxcd 
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Operation: 
if op == 'setlo': 
Sa[DST] = (Sa[DST] & Oxffff0000) | IMM16 
else: 
Sa[DST] = (Sa[DST] & Oxffff) | ІММІ6 << 16 
Addition: add 


Does what it says on the tin. The second source comes from a mangled register index. The long address flags are set. 


Instructions: 
Instruction | Operands Opcode 
add [Sc[CDST]] Sa[DST] Sa[SRC1] Sa[SRC2S] Oxcb 
Operation: 
res = $a[SRC1] + Sa[SRC2S] 
Sa[DST] = res 
cres = 0 
if res & 1 << 31: 
cres |= 1 
if res == 
cres |= 2 
if CDST < 4: 
Sc[CDST].address.long = cres 


Bit operations: bitop 


Performs an arbitrary two-input bit operation on two registers, selected by SRC1 and SRC2. The long address flags 


are set. 
Instructions: 
Instruction | Operands Opcode 
bitop ВІТОР [$c[CDST]] Sa[DST] Sa[SRC1] Sa[SRC2] | 0xd3 
Operation: 
res = bitop(BITOP, Sa[SRC2], Sa[SRC1]) & Oxffffffff 
Sa[DST] = res 
cres = 0 
if res & 1 «« 31: 
cres |= 1 
if res -- 
cres |= 2 
if CDST « 4: 
$c[CDST].address.long - cres 
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Address addition: aadd 


Adds the contents of a register to the addr field of another register. Short address flag is set. 


Instructions: 


Instruction | Operands Opcode 
aadd [$c[CDST]] $a[DST] Sa[SRC2S] | Охса 


Operation: 


Sa[DST].addr += Sa[SRC2S] 


if CDST < 4: 
Sc[CDST].address.short = $a[DST].addr >= $a[DST].limit 


Load: Idvh, Idvv, Ids 


Loads from the given address ORed with an unsigned 11-bit immediate. 1dvh is a horizontal vector load, 1dvv is а 
vertical vector load, and 1ds is ascalar load. Curiously, while register is ORed with the immdiate to form the address, 
they are added to make $c output. 


Instructions: 
Instruction | Operands Opcode 
ldvh Sv[DST] [$c[CDST]] Sa[SRC1] UIMM | 0xd8 
ldvv Sv[DST] [S$c[CDST]] Sa[SRC1] UIMM | 0xd9 
lds Sr[DST] [$c[CDST]] Sa[SRC1] UIMM | 0xda 
Operation: 
if op == 'ldvh': 


addr = addresses horizontal($a[SRCl].addr | UIMM, Sa[SRC1].stride) 
for idx in гапде(16): 
Sv [DST] [idx] = DS[addr [idx] ] 
elif ор == 'ldvv': 
addr = addresses vertical($a[SRCl].addr | UIMM, Sa[SRC1].stride) 
for idx in гапде (16): 
Sv [DST] [idx] = DS [addr [idx] ] 
elif ор == 'lds': 
addr = addresses scalar($a[SRCl].addr | UIMM, Sa[SRC1].stride) 
for idx in range(4): 
Sr[DST] [idx] = DS [addr [idx] ] 


if CDST < 4: 
Sc[CDST].address.short = (($a[SRC1].addr + UIMM) & Oxffff) >= S$Sa[SRCl].limit 


Load and add: Idavh, Idavv, Idas 


Loads from the given address, then post-increments the address by the contents of a register (like the aadd instruction) 
or an immediate. 1davh is a horizontal vector load, 1davv is a vertical vector load, and 1das is a scalar load. 
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Instructions: 
Instruction | Operands Opcode 
ldavh Sv[DST] [Sc[CDST]] Sa[SRC1] Sa[SRC2S] 0хс0 
ldavv Sv[DST] [Sc[CDST]] Sa[SRC1] Sa[SRC2S] Oxcl 
ldas Sr[DST] [Sc[CDST]] Sa[SRC1] Sa[SRC2S] Oxc2 
ldavh Sv[DST] [Sc[CDST]] Sa[SRC1] IMM 0ха0 
ldavv Sv[DST] [Sc[CDST]] Sa[SRC1] IMM 0ха1 
ldas Sr[DST] [$c[CDST]] Sa[SRC1] IMM 0xd2 

Operation: 

if op == 'ldavh': 
addr addresses horizontal($a[SRCl].addr, $a[SRCl].stride) 


else: 


elif op 


== 'ldavv': 
addr = addresses_vertical ($a[SRC1].addr, 


for idx іп range(16): 
Sv [DST] [idx] 


DS [addr [idx] ] 


addr 


elif op == 'ldas': 
addresses scalar($a[SRCl].addr, 
for idx in range(4): 

Sr [DST] [idx] 


if IMM is None: 
Sa[SRCl].addr += $ 


Sa[SRCl].addr += I 


If CDST «-43 
$c[CDST].address.s 


for idx in гапде (16): 
Sv [DST] [idx] 


DS [addr [idx] ] 


DS [addr [idx] ] 


a[SRC2S] 


MM 


Sa[SRCl].stride) 


Sa[SRC1].stride) 


Sa[SRC1l].addr >= $a[SRC1].limit 


Store: stvh, stvv, sts 


Like corresponding /d* instructions, but store instead of load. SRC1 and DST fields are exchanged. 


Instructions: 
Instruction | Operands Opcode 
stvh Sv[SRC1] [S$c[CDST]] За(р5Т| UIMM | Oxdc 
stvv Sv[SRC1] [$c[CDST]] Sa[DST] UIMM | Oxdd 
sts Sr[SRC1] [$c[CDST]] Sa[DST] UIMM | 0хде 
Operation: 
if ор == 'stvh': 
addr addresses_horizontal ($a[DST].addr UIMM, Sa[DST].stride) 


for idx in range(16): 
DS [addr [idx] ] 
elif op == 'stvv': 


addr = addresses уегііса1 (Ѕа [рт] .аааг | 
for idx in range(16): 
DS [addr [idx] ] 


$v [SRC1] [idx] 


$v [SRC1] [idx] 


UIMM, 


Sa[DST].stride) 


(continues on next page) 
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(continued from previous page) 


elif op --2 'sts': 
addr = addresses scalar($a[DST].addr | UIMM, Sa[DST].stride) 
for idx in range(4): 
DS[addr[idx]] = $r[SRC1] [idx] 


if CDST « 4: 
Sc[CDST].address.short = (($a[DST].addr + UIMM) & Oxffff) >= Sa[DST].limit 


Store and add: stavh, stavv, stas 


Like corresponding /da* instructions, but store instead of load. SRC1 and DST fields are exchanged. 


Instructions: 
Instruction | Operands Opcode 
stavh Sv[SRC1] [$c[CDST]] Sa[DST] Sa[SRC2S] | 0хс4 
stavv Sv[SRC1] [$c[CDST]] Sa[DST] Sa[SRC2S] | Охс5 
stas Sr[SRC1] [$c[CDST]] Sa[DST] Sa[SRC2S] | Охсб 
stavh Sv[SRC1] [Sc[CDST]] S$a[DST] IMM Oxd4 
stavv Sv[SRC1] [Sc[CDST]] За(05Т| IMM 0х45 
stas Sr[SRC1] [Sc[CDST]] Sa[DST] IMM Oxd6 

Operation: 

if op == 'stavh': 


addr = addresses horizontal($a[DST].addr, Sa[DST].stride) 
for idx іп гапде (16): 
DS[addr[idx]] = $v[SRC1] [idx] 
elif op == 'stavv': 
addr = addresses vertical($a[DST].addr, $a[DST].stride) 
for idx in range(16): 
DS[addr[idx]] = $v[SRC1] [idx] 
elif op == 'stas': 
addr = addresses scalar($a[DST].addr, Sa[DST].stride) 
for idx in range(4): 
DS[addr[idx]] = $r[SRC1] [idx] 


if IMM is None: 
Sa[DST].addr + 

else: 
Sa[DST].addr += IMM 


Sa[SRC2S] 


if CDST « 4: 
Sc[CDST].address.short = Sa[DST].addr >= Sa[DST].limit 


Load raw: Idr 


A raw load instruction. Loads one byte from each bank of the data store. The banks correspond directly to destination 
register components. The addresses are composed from ORing an address register with components of a vector register 
shifted left by 4 bits. Specifically, for each component, the byte to access is determined as follows: 


* take address register value 
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* shift it right 4 bits (they're discarded) 

* OR with the corresponding component of vector source register 
* bit 0 of the result selects low/high byte of the bank 

* bits 1-8 of the result select the cell index in the bank 


This instruction shares the 0xd7 opcode with star. They are differentiated by instruction word bit 0, set to 0 in case 
of ldr. 


Instructions: 


Instruction | Operands Opcode 
ldr Sv[DST] Sa[SRC1] Sv[SRC2] | 0xd7.0 


Operation: 


for idx in range(16): 
addr = Sa[SRCl].addr >> 4 | Sv[SRC2] [idx] 
$v[DST] [idx] = DS[idx, addr >> 1 & Oxff, addr & 1] 


Store raw and add: star 


A raw store instruction. Stores one byte to each bank of the data store. As opposed to raw load, the addresses 
aren't controllable per component: the same byte and cell index is accessed in each bank, and it's selected by post- 
incremented address register like for sta*. $c output is not supported. 


This instruction shares the 0xd7 opcode with /da. They are differentiated by instruction word bit 0, set to 1 in case of 
star. 


Instructions: 


Instruction | Operands Opcode 
star Sv[SRC1] Sa[DST] $a[SRC2S] | Oxd7.1 


Operation: 


for idx in range(16): 
addr = Sa[DST].addr >> 4 
DS[idx, addr >> 1 & Oxff, addr & 1] = Sv[SRC1] [idx] 


Sa[DST].addr += Sa[SRC2S] 


Load extra and add: Idaxh, Idaxv 


Like /dav*, except the data is loaded to $vx. If a selected $c flag is set (the same one as used for SRC2S mangling), 
the same data is also loaded to a $v register selected by DST field mangled in the same way as in v/rp2 family of 
instructions. 


Instructions: 
Instruction | Operands Opcode 
ldaxh $v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S] | 0хс8 
ldaxv $v[DST]q [$c[CDST]] $a[SRC1] $a[SRC2S] | 0хс9 


420 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Operation: 


if op == 'ldaxh': 
addr = addresses horizontal($a[SRCl].addr, $a[SRCl].stride) 
for idx in гапде(16): 
Svx[idx] = DS[addr[idx]] 
elif ор == 'ldaxv': 
addr = addresses vertical(S$Sa[SRCl].addr, Sa[SRCl].stride) 
for idx in гапде(16): 
Svx[idx] = DS[addr[idx]] 


if Sc[COND] & 1 << SLCT: 
for idx in гапде (16): 
Sv[ (DST в Oxlc) | ((DST + (S$c[COND] >> 4)) в 3)] [idx] = $vx[idx] 


Sa[SRC1].addr += Sa[SRC2S] 


if CDST « 4: 
Sc[CDST].address.short = $a[SRCl].addr >= Sa[SRCl].limit 


DMA transfers 


Contents 


* DMA transfers 


— Introduction 


— DMA registers 


Todo: write me 


Introduction 


Todo: write me 


DMA registers 


Todo: write me 


FIFO interface 
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Contents 


* FIFO interface 
— Introduction 


— Method registers 


— FIFO access registers 


Todo: write me 


Introduction 


Todo: write me 


Method registers 


Todo: write me 


FIFO access registers 


Todo: write me 


Introduction 


Todo: write me 


2.11.2 VP2/VP3 vpc processor 


Contents: 


Overview of VP2/VP3/VP4 унс hardware 


Contents 


» Overview of VP2/VP3/VP4 vuc hardware 


422 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


— Introduction 


The MMIO registers - VP2 
The MMIO registers - VP3/VP4 


- Interrupts 


Introduction 


vuc is a microprocessor unit used as the second stage of the VP2 [in H.264 mode only], VP3 and VP4 video decoding 
pipelines. The same name is also used to refer to the instruction set of this microprocessor. vuc’s task is to read 
decoded bitstream data written by VLD into the MBRING structure, do any required calculations on this data, then 
construct instructions for the VP stage regarding processing of the incoming macroblocks. The work required of vuc is 
dependent on the codec and may include eg. motion vector derivation, calculating quantization parameters, converting 
macroblock type to prediction modes, etc. 


On VP2, the vuc is located inside the PBSP engine [see vdec/vp2/pbsp.txt]. On VP3 and УРА, it is located inside the 
PPDEC engine [see vdec/vp3/ppdec.txt]. 


The vuc unit is made of the following subunits: 


the vuc microcprocessor - oversees everything and does the calculations that are not performance-sensitive 
enough to be done in hardware 


MBRING input and parsing circuitry - reads bitstream data parsed by the VLD 


MVSURE input and output circuitry - the MVSURF is a storage buffer attached to all reference pictures in 
H.264 and to P pictures in VC-1, MPEG-4. It stores the motion vectors and other data used for direct prediction 
in B pictures. There are two MVSUREs that can be used: the output MVSUREF that will store the data of the 
current picture, and the input MVSURF that should store the data for the first picture in L1 list [H.264] or the 
last P picture [other codecs] 


VPRINGS output circuitry [VP2 only] - the VPRINGs are ring buffers filled by vuc with instructions for 
various VP subunits. There are three VPRINGs: VPRING_DEBLOCK used for deblocking commands, 
VPRIND RESIDUAL used for the residual transform coefficients, and VPRINT СТКІ, used for the motion 
vectors and other control data. 


direct VP connection [VP3, VP4 only] - the VP3+ vuc is directly connected to the VP engine, instead of relying 
on ring buffers in memory. 


The MMIO registers - VP2 


The уис registers are located in PBSP XLMI space at addresses 0x08000:0x10000 [BARO addresses 
0x103200:0x103400]. They are: 


08000:0a000/103200:103280: DATA - vuc microprocessor data space [vdec/vuc/isa.txt] 
0а000/103280: ICNT - executed instructions counter, aliased to vuc special register $sr15 [$icnt] 


0а100/103284: WDCNT - watchdog count - when ICNT reaches WDCNT value and WDCNT is not equal to 
Oxffff, a watchdog interrupt is raised 


0a200/103288: CODE, CONTROL - code execution control [vdec/vuc/isa.txt] 03300/10328c: CODE WINDOW 
- code access window [vdec/vuc/isa.txt] 0a400/103290: H2V - host to vuc scratch register [vdec/vuc/isa.txt] 
04500/103294: V2H - vuc to host scratch register [vdec/vuc/isa.txt] 04600/103298: PARM - sequence/picture/slice 
parameters required by vuc 
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hardware, aliased to vuc special register $sr7 [$parm] 


0а700/10329с: PC - program counter [vdec/vuc/isa.txt] 04800/103240: УРКІМС RESIDUAL.OFFSET - the 
VPRING RESIDUAL offset 0a900/1032a4: VPRING. RESIDUAL.HALT POS - the VPRING RESIDUAL halt po- 
sition 0aa00/1032a8: VPRING, RESIDUAL.WRITE POS - the VPRING. RESIDUAL write position 0ab00/1032ac: 
VPRING RESIDUAL.SIZE - the VPRING_RESIDUAL size 0ac00/1032b0: | VPRING. CTRL.OFFSET - 
the VPRING СТКІ. offset 0ad00/1032b4: | VPRING. CTRL.HALT POS - the VPRING СТКІ. halt po- 
sition 0ae00/1032b8: | VPRING CTRL.WRITE POS - the VPRING СТКІ. write position Oaf00/1032bc: 
VPRING CTRL.SIZE - the VPRING CTRL size 05000/1032с0: | VPRING DEBLOCK.OFFSET - the 
VPRING DEBLOCK offset 0b100/1032c4: VPRING DEBLOCK.HALT POS - the VPRING DEBLOCK halt po- 
sition 0b200/1032c8: VPRING DEBLOCK.WRITE POS - the VPRING_DEBLOCK write position 0b300/1032cc: 
VPRING DEBLOCK.SIZE - the VPRING DEBLOCK size 0b400/1032d0: VPRING TRIGGER - flush/resume 
triggers the for VPRINGs 0b500/1032d4: INTR - interrupt status 0b600/1032d8: INTR EN - interrupt en- 
able mask 0b700/1032dc: VPRING ENABLE - enables VPRING access 0b800/1032e0: MVSURF ІМ OFFSET 
- MVSURF IN offset [vdec/vuc/mvsurf.txt] 0b900/1032e4: MVSURF IN PARM - МУ50КЕ IN parame- 
ters [vdec/vuc/mvsurf.txt] 0ba00/1032e8: MVSURF IN LEFT - МУ50КЕ IN data left [vdec/vuc/mvsurf.txt] 
0bb00/1032ec: МУЅОКЕ IN POS - MVSURF_IN position [vdec/vuc/mvsurf.txt] 0bc00/1032f0: MV- 
SURF OUT OFFSET - MVSURF OUT offset [vdec/vuc/mvsurf.txt] 0bd00/1032f4: MVSURF OUT PARM - 
MVSURF ОСТ parameters [vdec/vuc/mvsurf.txt] 0be00/1032f8: MVSURF OUT LEFT - MVSURF OUT space 
left [vdec/vuc/mvsurf.txt] 0bf00/1032fc: MVSURF OUT POS - MVSURF OUT position [vdec/vuc/mvsurf.txt] 
0c000/103300: МВВІМС OFFSET - the MBRING offset 0c100/103304: MBRING SIZE - the MBRING size 
0c200/103308: MBRING. READ. POS - the MBRING read position 0c300/10330c: MBRING READ AVAIL - the 
bytes left to read in MBRING 


The MMIO registers - УРЗ/УР4 


The vuc registers are located in PPDEC falcon IO space at addresses 0x10000:0x14000 [BARO addresses 
0x085400:0x085500]. They are: 


10000:11000/085400:085440: DATA - vuc microprocessor data space [vdec/vuc/isa.txt] 


11000/085440: CODE CONTROL - code execution control [vdec/vuc/isa.txt] 11100/085444: CODE WINDOW - 
code access window [vdec/vuc/isa.txt] 11200/085448: ICNT - executed instructions counter, aliased to vuc special 


register $sr15 [$icnt] 


11300/08544с: WDCNT - watchdog count - when ICNT reaches WDCNT value and WDCNT is not equal to 
Oxffff, a watchdog interrupt is raised 


11400/085450: H2V - host to vuc scratch register [vdec/vuc/isa.txt] 11500/085454: V2H - vuc to host scratch register 
[vdec/vuc/isa.txt] 11600/085458: PARM - sequence/picture/slice parameters required by vuc 


hardware, aliased to vuc special register $sr7 [$parm] 


11700/08545c: PC - program counter [vdec/vuc/isa.txt] 11800/085460: RPITAB - the address of refidx -> RPI trans- 
lation table 11900/085464: REFTAB - the address of RPI -> VM address translation table 11a00/085468: BUSY - a 
status reg showing which subunits of vuc are busy 11c00/085470: INTR - interrupt status 11d00/085474: INTR_EN 
- interrupt enable mask 12000/085480: MVSURF IN ADDR - MVSURF_IN address [vdec/vuc/mvsurf.txt] 
12100/085484: MVSURF IN PARM - MVSURF IN parameters [vdec/vuc/mvsurf.txt] 12200/085488: МУ- 
SURF IN LEFT - MVSURF IN data left [vdec/vuc/mvsurf.txt] 12300/08548с: MVSURF IN POS - MV- 
SURE IN position [vdec/vuc/mvsurf.txt] 12400/085490: MVSURF OUT ADDR - MVSURF OUT address 
[vdec/vuc/mvsurf.txt] 12500/085494: MVSURF OUT PARM - MVSURF OUT parameters [vdec/vuc/mvsurf.txt] 
12600/085498: MVSURF OUT LEFT - MVSURF OUT space left [vdec/vuc/mvsurf.txt] 12700/08549с: MV- 
SURF OUT POS - MVSURF OUT position [vdec/vuc/mvsurf.txt] 12800/085440: MBRING. OFFSET - the 
MBRING offset 12900/0854a4: MBRING. SIZE - the MBRING size 12а00/0854а8: MBRING. READ. POS - the 
MBRING read position 12b00/0854ac: MBRING. READ AVAIL - the bytes left to read in MBRING 12c00/0854b0: 
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9222 [XXX] 12d00/0854b4: 222 [XXX] 12e00/0854b8: ??? [XXX] 12f00/0854bc: STAT - control/status register 
[vdec/vuc/isa.txt] 13000/0854c0: ??? [XXX] 13100/0854c4: ??? [XXX] 


Interrupts 


Todo: write me 


VP2/VP3/VP4 унс ISA 


Contents 


e VP2/VP3/VP4 vuc ISA 


Introduction 


The delays 


The opcode format 


The code space and execution control 


The data space 


Instruction reference 


* 


* 


* 


* 


Data movement instructions: slct, mov 

Addition instructions: add, sub, subr, avgs, avgu 
Comparison instructions: setgt, setlt, seteq, setlep, setzero 
Clamping and sign extension instructions: clamplep, clamps, sext 
Division by 2 instruction: div2s 

Bit manipulation instructions: bset, bclr, btest 

Swapping reg halves: hswap 

Shift instructions: shl, shr, sar 

Bitwise instructions: and, or, xor, not 

Minmax instructions: min, max 

Predicate instructions: and, or, xor 

No operation: nop 

Long multiplication instructions: Imulu, Imuls 

Long arithmetic unary instructions: lsrr, ladd, lsar, ldivu 
Control flow instructions: bra, call, ret 


Memory access instructions: ld, st 


— The scratch special registers 


- The $stat special register 
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* Sleep instruction: sleep 
* Wait for status bit instructions: wstc, wsts 
— The watchdog counter 


* Clear watchdog counter instruction: clicnt 


— Misc special registers 


Introduction 


This file deals with description of the ISA used by the vuc microprocessor, which is described in vdec/vuc/intro.txt. 


The microprocessor registers, instructions and memory spaces are mostly 16-bit oriented. There are 3 ISA register 
files: 


* $r0-$r15, 16-bit general-purpose registers, for arithmetic and addressing 
— $10: read-only and hardwired to 0 
— $r1-$r15: read/write 
* $p0-$p15, 1-bit predicate registers, for conditional execution 
— $p0: read/write 
— $pl: read only and hardwired to !$p0 
- $p2-$p14: read/write 
- $p15: read-only and hardwired to 1 
* $sr0-$sr63, 16-bit special registers 
- $sr0/$asel: A neighbour read selection [VP2 only] [vdec/vuc/vreg.txt] 
- $sr1/$bsel: B neighbour read selection [VP2 only] [vdec/vuc/vreg.txt] 
- $sr2/$spidx: [sub]partition selection [vdec/vuc/vreg.txt] 
- $sr3/$baddr: В neighbour read address [VP2 only] [vdec/vuc/vreg.txt] 
- $sr3/$absel: A and B neighbour selection [VP3+ only] [vdec/vuc/vreg.txt] 
- $sr4/$h2v: host to vuc scratch register [vdec/vuc/isa.txt] 
- $sr5/$v2h: vuc to host scratch register [vdec/vuc/isa.txt] 
— $sr6/$stat: a control/status register [vdec/vuc/isa.txt] 
- $sr7/$parm: video parameters [vdec/vuc/vreg.txt] 
- $sr8/$pc: program counter [vdec/vuc/isa.txt] 
- $sr9/$cspos: call stack position [vdec/vuc/isa.txt] 
- $sr10/$cstop: call stack top [vdec/vuc/isa.txt] 
- $srll/Srpitab: RPI lut pointer [VP2 only] [vdec/vuc/vreg.txt] 
- $sr12/$lhi: long arithmetic high word [vdec/vuc/isa.txt] 
- $sr13/$llo: long arithmetic low word [vdec/vuc/isa.txt] 


- $sr14/$pred: alias of $p register file [vdec/vuc/isa.txt] 
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- $sr15/$icnt: cycle counter [vdec/vuc/isa.txt] 
- $sr16/$mvxl0: motion vector LO X component [vdec/vuc/vreg.txt 
- $sr17/$mvyl0: motion vector LO Y component [vdec/vuc/vreg.txt 
[ 
[ 


- $sr18/$mvxll: motion vector L1 X component [vdec/vuc/vreg.txt 


л: lc c 


- $sr19/$mvyll: motion vector L1 Y component [vdec/vuc/vreg.txt 
- $sr20/$refl0: LO refidx [vdec/vuc/vreg.txt] 

- $sr21/$refll: ІЛ refidx [vdec/vuc/vreg.txt] 

- $sr22/$rpil0: LO RPI [vdec/vuc/vreg.txt] 

- $sr23/$rpill: L1 RPI [vdec/vuc/vreg.txt] 

- $sr24/$mbflags: macroblock flags [vdec/vuc/vreg.txt] 

- $sr25/$qpy: luma quantiser and intra chroma pred mode [vdec/vuc/vreg.txt] 
- $sr26/$qpc: chroma quantisers [vdec/vuc/vreg.txt] 

- $sr27/$mbpart: macroblock partitioning schema [vdec/vuc/vreg.txt] 

- $sr28/$mbxy: macroblock X and Y position [vdec/vuc/vreg.txt] 

- $sr29/$mbaddr: macroblock address [vdec/vuc/vreg.txt] 

- $sr30/$mbtype: macroblock type [vdec/vuc/vreg.txt] 

- $sr31/$submbtype: submacroblock types [VP2 only] [vdec/vuc/vreg.txt] 


- $sr32/$amvxl0: A neighbour’s $mvxl0 
- $sr33/$amvyl0: A neighbour’s $mvylO 
- $sr34/$amvxl1: A neighbour’s $mvxl1 


vdec/vuc/vreg.txt 
vdec/vuc/vreg.txt 


vdec/vuc/vreg.txt 


pr RÀ "жет pm 


] 

] 

] 
- $sr35/Samvyll: A neighbour's $mvyl1 ] 
- $sr36/$arefl0: A neighbour's $refl0 [vdec/vuc/vreg.txt] 


vdec/vuc/vreg.txt 


- $sr37/Sarefll: A neighbour's 5гей1 [vdec/vuc/vreg.txt] 
- $sr38/$arpil0: А neighbour's $rpil0 [vdec/vuc/vreg.txt] 

— $sr39/Sarpill: A neighbour's $rpill [vdec/vuc/vreg.txt] 

- $sr40/Sambflags: A neighbour's $mbflags [vdec/vuc/vreg.txt] 

- $sr41/$aqpy: A neighbour's $qpy [VP2 only] [vdec/vuc/vreg.txt] 
- $sr42/Saqpc: A neighbour’s барс [VP2 only] [vdec/vuc/vreg.txt] 
- $sr48/$bmvxl0: B neighbour's $mvxlO [vdec/vuc/vreg.txt] 

- $sr49/$bmvyl0: B neighbour’s $mvyl0 [vdec/vuc/vreg.txt] 

- $sr50/Sbmvxl1: B neighbour’s $mvxl1 [vdec/vuc/vreg.txt] 

- $sr51/$bmvyl1: B neighbour’s $mvyl1 [vdec/vuc/vreg.txt] 

- $sr52/$brefl0: B neighbour's $refl0 [vdec/vuc/vreg.txt] 

- $sr53/Sbrefll: В neighbour's $refll [vdec/vuc/vreg.txt] 


- $sr54/$brpil0: B neighbour’s $rpil0 [vdec/vuc/vreg.txt] 
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— $sr55/Sbrpill: B neighbour's $rpill [vdec/vuc/vreg.txt] 

— $sr56/Sbmbflags: B neighbour's $mbflags [vdec/vuc/vreg.txt] 
— $sr57/Sbqpy: B neighbour's $qpy [vdec/vuc/vreg.txt] 

— $sr58/Sbqpc: B neighbour's $qpc [vdec/vuc/vreg.txt] 


There are 7 address spaces the vuc can access: 


БП - user data [vdec/vuc/isa.txt] 


PWTT[] - pred weight table data, read-only. This space is filled when a packet of type 4 is read from the MBRING. 
Byte-addressed, 0x200 bytes long, loads are in byte units. 


УР[] - VPRING output data, write-only. Data stored here will be written to VPRING. DEBLOCK and 
VPRING CTRL when corresponding commands are invoked. Byte-addressed, 0x400 bytes long. Stores are 
in byte or word units depending on the address. 


MVSI[] - MVSURF input data [read-only] [vdec/vuc/mvsurf.txt] 
MVSO[] - MVSURF output data [write-only] [vdec/vuc/mvsurf.txt] 
B6[] - io address space? [XXX] 

B7[] - io address space? [XXX] 


The vuc code resides in the code space, separate from the above spaces. The code space is a dedicated SRAM of 
0х800 instruction words. An instruction word consists of 40 bits on VP2, 30 bits on VP3. 


The delays 


The vuc lacks interlocks - on every cycle when vuc microcprocessor is active and not sleeping/waiting, one instruction 
begins execution. Most instructions finish in one cycle. However, when an instruction takes more than one cycle to 
finish, vuc will continue to fetch and execute subsequent instructions even if they have dependencies on the current 
instruction - it is thus required to manually insert nops in the code or schedule instructions to avoid such situations. 


An X-cycle instruction happens in three phases: 
* cycle 0: source read - the inputs to the instruction are gathered 
e cycles 0..(X-1): result computation - 
* cycle X: destination writeout - the results are stored into the destination registers 


For example, add $r1 $r2 $13 is a 1-cycle instruction. On cycle 0, the sources are read and the result is computed. On 
cycle 1, in parallel with executing the next instruction, the result is written out to $r1. 


The extra cycle for destination writeout means that, in general, it's required to have at least 1 unrelated instruction 
between writing a register and reading it. However, vuc implements store-to-load forwarding for some common cases 
- the result value, which is already known on cycle (X-1), is transferred directly into the next instruction, if there's a 
match betwen the next instruction's source register index and current instruction's destination register index. Store-to- 
load forwarding happens in the following situations: 


* all $r register reads and writes 

* all $p register reads and writes, except by accessing them through $pred special register 

* $lhi/$llo register reads and writes done implicitely by long arithmetic instructions 
Store-to-load forwarding does NOT happen in the following situations: 

* $sr register reads and writes 


Example 1: 
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: add $r1 $r2 $13 add $r4 $r1 $r5 
No delay needed, store-to-load forwarding happens: 
• cycle 0: $r2 and 513 read, $r2+$r3 computed 


• cycle 1: $r5 read, previous result read due to l-t-s forwarding match for $r1, prev+$r5 computed, previous 
result written to 511 


* cycle 2: next instruction begins execution, insn 1 result written to $r5 
Example 2 [missing delay]: 
:: add $mvxl0 $12 $r3 add $r4 $mvxl0 $r5 
Delay needed, but not supplied - store-to-load forwarding doesn't happen and old value is read: 
• cycle 0: $r2 and $13 read, $r2+$r3 computed 
* cycle 1: $mvxl0 and $r5 read, $mvxl0+$r5 computed, previous result written to $mvxlO 
* cycle 2: next instruction begins execution, insn 1 result written to $r5 


Code is equivalent to: 


Sr4 = $mvxlO + $r5; 
Smvxl0 = $r2 + $r3; 


Example 3 [proper delay]: 
:: add $mvxl0 $12 $r3 nop add $r4 $mvxlO $r5 
Delay needed and supplied: 
• cycle 0: $r2 and $r3 read, $r2+$r3 computed 
* cycle 1: nop executes, previous result written to $mvxl0 
• cycle 2: new $mvxl0 апа $r5 read, $mvxl0+$r5 computed 
* cycle 3: next instruction begins execution, insn 2 result written to $r5 


Code is equivalent to: 


Smvxl0 = $r2 + $r3; 
Sr4 = 5шүх10 + $r5; 


Since long-running instructions use execution units during their execution, it’s usually forbidden to launch other in- 
structions using the same execution units until the first instruction is finished. When such execution unit conflict 
happens, the old instruction is aborted. 


It is possible that two instructions with different write delays will try to perform a register write in the same cycle 
(e.g. Id-nop-mov sequence). If the write destinations are different, both writes will happen as expected. If the write 
destinations are the same, destination carries the value of the last write. 


The branch instructions take two cycles to finish - the instruction after the jump [the delay slot] is executed regardless 
of whether the jump is taken or not. 


The opcode format 


The opcode bits are: 
* 0-4: opcode selection [OP] 
* 5-6, base opcodes: predicate output mode [POM] 
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— 00: $p &- predicate output 
- 01: $p l= predicate output 
- 10: $p = predicate output 
- 11: predicate output discarded 
* 7, base opcodes: predicate output negation flag [PON] 
* 5-7, special opcodes: special opcode class selection [OC] 
— 000: control flow 
— 001: io control 
— 010: predicate manipulation 
— 100: load/store 
— 101: multiplication 
8-11: source 1 [SRC1] 
12-15: source 2 [SRC2] 
16-19: destionation [DST] 
8-18: branch target [BTARG] 
20-23: predicate [PRED] 
24-25: extra bits for immediate and $sr [EXT] 
26: opcode type 0 [ОТО] 
27: source 2 immediate flag [IMMF] 
28: opcode type | |ОТ1| 
29: predicate enable flag [PE] 
30-32: relative branch predicate [RBP] - VP2 only 


33: relative branch predicate negation flag [RBN] - VP2 only 
34-39: relative branch target [RBT] - VP2 only 


On VP2, a single instruction word holds two instruction slots - the normal instruction slot in bits 0-29, and the relative 
branch instruction slot in bits 30-39. When the instruction is executed, both instruction slots are executed simul- 
tanously and independently. 


The relative branch slot can hold only one type of instruction, which is the relative branch. The main slot can hold all 
other types of instructions. It's possible to encode two different jumps in one opcode by utilising both the branch slot 
and the main instruction slot for a branch. The branch will take place if any of the two branch conditions match. If 
both branch conditions match, the actual branch executed is the one in the main slot. 


On VP3+, the relative branch slot no longer exists, and the main slot makes up the whole instruction word. 


There are two major types of opcodes that can be stored in the main slot: base opcodes and special opcodes. The type 
of instruction in the main slot is determined by OTO and OTI bits: 


* ОТО = 0, OTI = 0: base opcode, $r destination, $r source 1 
* ОТО = 1, OTI = 0: base opcode, $r destination, $sr source 1 
* ОТО = 0, OTI = 1: base opcode, $sr destination, $r source 1 
e ОТО = 1, OTI = 1: special opcode 
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For base opcodes, the OP bits determine the final opcode: 


* 00000: 
* 00001: 
* 00100: 
* 00101: 
* 00110: 
00110: 
00111: 
01000: 
01001: 
01010: 
01011: 
01100: 
01101: 
01110: 
01111: 
01111: 
10000: 
10001: 
10010: 
10100: 
10101: 
10110: 
10111: 
11000: 
11001: 
11010: 
* 11011: 
* 11100: 
* 11101: 
* 11110: 


slct [slct form] select 

mov [mov form] move 

add [binary form] add 

sub [binary form] substract 

subr [binary form] substract reverse [VP2 only] 
avgs [binary form] average signed [VP3+ only] 
avgu [binary form] average unsigned [VP3- only] 

setgt [set form] set if greater than 

setlt [set form] set if less than 

seteq [set form] set if equal to 

setlep [set form] set if less or equal and positive 

clamplep [binary form] clamp to less or equal and positive 
clamps [binary form] clamp signed 

sext [binary form] sign extension 

setzero [set form] set if both zero [VP2 only] 

div2s [unary form] divide by 2 signed [VP3+ only] 

bset [binary form] bit set 

bclr [binary form] bit clear 

btest [set form] bit test 

hswap [unary form] swap reg halves 

shl [binary form] shift left 

shr [binary form] shift right 

sar [binary form] shift arithmetic right 

and [binary form] bitwise and 

or [binary form] bitwise or 

xor [binary form] bitwise xor 

not [unary form] bitwise not 

lut [binary form] video LUT lookup 

min [binary form] minimum [VP3+ only] 


max [binary form] maximum [VP3+ only] 


For special opcodes, the OC bits determine the opcode class, and OP bits further determine the opcode inside that 
class. The classes and opcodes are: 


* OC 000: control flow 
— 00000: bra [branch form] branch 
— 00010: call [branch form] call 
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— 00011: ret [simple form] return 
— 00100: sleep [simple form] sleep 
— 00101: wstc [immediate form] wait for status bit clear 
— 00110: wsts [immediate form] wait for status bit set 
* OC 001: io control 
— 00000: clicnt [simple form] clear instruction counter 
— 00001: ??? [XXX] [simple form] 
— 00010: ??? [XXX] [simple form] 
— 00011: ??? [XXX] [simple form] 
— 00100: mbiread [simple form] macroblock input read 
— 00101: ??? [XXX] [simple form] 
— 00110: ??? [XXX] [simple form] 
— 01000: mbinext [simple form] macroblock input next 
- 01001: mvsread [simple form] MVSURF read 
— 01010: mvswrite [simple form] MVSURF write 
— 01011: ??? [XXX] [simple form] 
— 01100: ??? [XXX] [simple form] 
* OC 010: predicate manipulation 
— ххх00: and [predicate form] and 
- ххх01: or [predicate form] or 
— ххх10: xor [predicate form] xor 
— ххх11: nop [simple form] no operation 


OC 100: load/store 


- xxxx0: st [store form] store 
- хххх1: ld [load form] load 
* OC 101: long arithmetic 
— 00000: Imulu [long binary form] long multiply unsigned 
— 00001: Imuls [long binary form] long multiply signed 
— 00010: Isrr [long unary form] long shift right with round 
- 00100: ladd [long unary form] long add [VP3+ only] 
- 01000: Isar [long unary form] long shift right arithmetic [VP3+ only] 
- 01100: Idivu [long unary form] long divide unsigned [VP4 only] 


All main slot opcodes can be predicated by an arbitrary $p register. The PE bit enables predication. If PE bit is 1, the 
main slot instruction will only have an effect if the $p register selected by PRED field has value 1. Note that PE bit 
also has an effect on instruction format - longer immediates are allowed, and the predicate destination field changes. 


Note that, for some formats, opcode fields may be used for multiple purposes. For example, mov instruction with 
PE=1 and IMMF=1 uses PRED bitfield both as the predicate selector and as the middle part of the immediate operand. 
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Such formats should be avoided unless it can be somehow guaranteed that the value in the field will fit all purposes 
it's used for. 


The base opcodes have the following operands: 
e binary form: pdst, dst, srcl, src2 
e unary form: pdst, dst, src1 
e set form: pdst, srcl, src2 
e slct form: pdst, dst, pred, srcl, src2 
e mov form: pdst, dst, Isrc 


The operands and their encodings are: 


pdst: predicate destination - this operand is special, as it can be used in several modes. First, the instruction 
generates a boolean predicate result. Then, if PON bit is set, this output is negated. Finally, it is stored to a $p 
register in one of 4 modes: 


- POM = 00: ӛр &= output 
- РОМ = 01: $p l= output 
- POM = 10: $p = output 
- POM = 11: output is discarded 
The $p output register is: 
- PE=0: $p register selected by PRED field 
- РЕ= 1: $p register selected by DST field 


dst: main destination 
- ОТО = 1 or OTI = 0: $r register selected by DST field 
- ОТО = 0 and OTI = 1: $sr register selected by DST [low bits] and EXT [high bits] fields 


pred - predicate source 


— all cases: $p register selected by PRED field 


src1: first source 
- ОТО = 0 or OTI = I: $r register selected by SRCI field, 
- ОТО = 1 and OTI = 0: $sr register selected by SRCI [low bits] and EXT [high bits] fields. 


src2: second source 
- IMMF = О: $r register selected by SRC2 field 


- IMMF = 1 and ОТО = ОТІ:. zero-extended 6-bit immediate value stored in SRC2 [low bits] and EXT 
[high bits] fields. 


- IMMF = 1 and ОТО != OTI: zero-extended 4-bit immediate value stored in SRC2 field. 


Isrc: long source 
- IMMF = 0: $r register selected by SRC2 field 


- IMMF = 1 and OTI = 0:. zero-extended 14-bit immediate value stored in SRCI [low bits], SRC2 [low 
middle bits], PRED [high middle bits] and EXT [high bits] fields. 


- IMMF = 1 and OTI = 1:. zero-extended 12-bit immediate value stored in 5КС1 [low bits], SRC2 [middle 
bits] and PRED [high bits] fields 
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The special opcodes have the following operands: 


simple form: [none] 


immediate form: imm4 


branch form: btarg 


predicate form: spdst, psrcl, psrc2 


store form: space[dst + srcl * 2], src2 [if IMMF is 0] 


store form: space[src1 + stoff], src2 [if IMMF is 1] 
load form: dst, space[src1 + Idoff] [if IMMF is 0] 
load form: dst, space[src1 + src2] [if IMMF is 1] 


long binary form: srcl, src2 


long unary form: src2 


The operands and their encodings are: 


srcl, src2, dst: like for base opcodes 


imm4: 4-bit immediate 


— all cases: 4-bit immediate stored in SRC2 field 


btarg: code address 


— all cases: 11-bit immediate stored in BTARG field 


spdst: predicate destination 
— РЕ = 0: $p register selected by PRED field 
- РЕ = I: $p register selected by DST field 


psrcl: predicate source 1, optionally negated 


— all cases: $p register selected by SRCI field, negated if bit 3 of OP field is set 


psrc2: predicate source 2, optionally negated 


— all cases: $p register selected by SRC2 field, negated if bit 2 of OP field is set 


space: memory space selection, OP field bits 1-4: 
- 0000: ОП 
- 0001: РУУТП - ld only 

0010: VP[] - st only 

0100: MVSI[] - Id only 

- 0101: MVSO[] - st only 

- 0110: B6[] 

- 0111: B7[] 


* stoff: store offset 


- PE=0: 10-bit zero-extended immediate stored in DST [low bits], PRED [middle bits] and EXT [high bits] 
fields 


— РЕ = 1: 6-bit zero-extended immediate stored in DST [low bits] and EXT [high bits] fields 
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* Idoff: load offset 


— РЕ = 0: 10-bit zero-extended immediate stored in SRC2 [low bits], PRED [middle bits] and EXT [high 
bits] fields 


— РЕ = 1: 6-bit zero-extended immediate stored in SRC2 [low bits] and EXT [high bits] fields 


The code space and execution control 


The vuc executes instructions from dedicated code SRAM. The code SRAM is made of 0x800 cells, with each cell 
holding one opcode. Thus, a cell is 40 bits wide on VP2, 30 bits wide on VP3+. The code space is addressed in opcode 
units, with addresses 0-Ox7ff. The only way to access the code space other than via executing instructions from it is 
through the code port: 


BARO 0x103288 / XLMI 0х0а200: CODE CONTROL [VP2] BARO 0x085440 / I[0x11000]: CODE CONTROL 
[VP3+] 


bits 0-10: ADDR, cell address to access by CODE WINDOW bit 16: STATE, code execution control: 0 
- code is being executed, 


CODE, WINDOW doesn't work, 1 - microprocessor is halted, CODE WINDOW is enabled 


BARO 0x10328c / XLMI 0x0a300: CODE WINDOW [VP2] BARO 0x085444 / I[0x11100]: CODE WINDOW 
[VP3+] 


Accesses the code space - see below 


On VP3+, reading or writing the CODE_WINDOW register will cause a read/write of the code space cell selected by 
ADDR, with the cell value taken from / appearing at bits 0-29 of CODE_WINDOW. ADDR is auto-incremented by 1 
with each access. 


On VP2, since code space cells are 40 bits long, accessing a cell requires two accesses to CODE_WINDOW. The cell 
is divided into 32-bit low part and 8-bit high part. There is an invisible 1-bit flipflop that selects whether the high part 
or the low part will be accessed next. The low part is accessed first, then the high part. Writing CODE_CONTROL 
will reset the flipflop to the low part. Accessing CODE_WINDOW with the flipflop set to the low part will access 
the low part, then switch the flipflop to the high part. Accessing CODE_WINDOW with the flipflop set to the high 
part will access the high part [through bits 0-7 of CODE WINDOW], switch the flipflop to the low part, and auto- 
increment ADDR by 1. In addition, writes through CODE WINDOW are buffered - writing the low part writes a 
shadow register, writing the high part assembles it with the current shadow register value and writes the concatenated 
result to the code space. 


The STATE bit is used to control vuc execution. This bit is set to 1 when the vuc is reset. When this bit is changed 
from 1 to 0, the vuc starts executing instructions starting from code address 0. When this bit is changed from | to 0, 
the vuc execution is halted. 


The data space 


БП is a read-write memory space consisting of 0x800 16-bit cells. Every address in range 0-Ox7ff corresponds to one 
cell. The D[] space is used for three purposes: 


* to store general-purpose data by microcode/host and communicate between the microcode and the host 


* tostore the RPI table, a mapping from bitstream reference indices to hw surface indices [RPIs], used directly by 
hardware [vdec/vuc/vreg.txt] 


* to store the REF table, a mapping from RPIs to surface VM addresses, used directly by hardware [VP3+] 
[vdec/vuc/vreg.txt] 


On VP2, the БП space can be accessed from the host directly by using the DATA window: 
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BARO 0x103200 + (i >> 6) * 4 [index i & 0x3f] / XLMI 0x08000 + i * 4, i < 0x800: РАТА [і [VP2] Accesses the 
data space - low 16 bits of DATA[i] go to D[] cell i, high 16 bits are unused. 


On VP3+, the DATA window also exists, but cells are accessed in pairs: 


BARO 0x085400 + (i >> 6) * 4 [index i & 0x3f] / I[0x10000 + i * 4], i < 0x400: DATA[i] [VP3+] Accesses the data 
space - low 16 bits of DATA[i] go to D[] cell i*2, high 16 bits go to D[] cell 1*2+1. 


The D[] space can be both read and written via the DATA window. 


Instruction reference 


In the pseudocode, all intermediate computation results and temporary variables are assumed to be infinite-precision 
signed integers: non-negative integers are padded at the left with infinite number of 0 bits, while negative integers are 
padded with infinite number of 1 bits. 


When assigning a result to a finite-precision register, any extra bits are chopped off. When reading a value from a 
finite-precision register, it's padded with infinite number of 0 bits at the left by default. A sign-extension read, where 
the register value is padded with infinite number of copies of its MSB instead, is written as SEX(reg). 


Operators used in the pseudocode behave as in C. 
Some instructions are described elsewhere. They are: 


* lut [vdec/vuc/vreg.txt] 


sleep [in $stat register description] 


wstc [in $stat register description] 


wsts [in $stat register description] 


clicnt [XXX] 


mbiread [vdec/vuc/vreg.txt] 


mbinext [vdec/vuc/vreg.txt] 


mvsread [vdec/vuc/mvsurf.txt] 


mvswrite [vdec/vuc/mvsurf.txt] 


Data movement instructions: slct, mov 


mov sets the destination to the value of the only source. slct sets the destination to the value of one of the sources, as 
selected by a predicate. 


Instruction: slct pdst, dst, pred, ѕгс1, src2 Opcode: base opcode, OP = 00000 Operation: 


result = (pred ? srcl : src2); 
dst = result; 
pdst = result & 1; 


Execution time: 1 cycle Predicate output: LSB of normal result 


Instruction: mov pdst, dst, Isrc Opcode: base opcode, OP = 00001 Operation: 


result = lsrc; 
dst = result; 
pdst = result & 1; 


Execution time: 1 cycle Predicate output: LSB of normal result 
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Addition instructions: add, sub, subr, avgs, avgu 


add performs an addition of two 16-bit quantities, sub and subr perform substraction, subr with reversed order of 
operands. avgs and avgu compute signed and unsigned average of two sources, rounding up. If predicate output is 
used, the predicate is set to the lowest bit of the result. 


Instructions:: add pdst, dst, srcl, src2 ОР-00100 sub pdst, dst, srcl, src2 ОР-00101 subr pdst, dst, srcl, src2 
ОР-00110 [VP2 only] avgs pdst, dst, 8гс1, src2 OP=00110 [VP3+ only] avgu pdst, dst, srcl, src2 ОР-00111 
[VP3+ only] 


Opcode: base opcode, OP as above Operation: 


if (op == add) result = srcl + src2; 

if (op == sub) result = srcl - src2; 

if (op == subr) result = src2 вел; 

if (ор == avgs) result = (SEX(srcl) + SEX(src2) + 1) >> 1; 
if (ор == ауди) result = (srcl + src2 + 1) >> 1; 

dst = result; 


pdst = result & 1; 


Execution time: 1 cycle Predicate output: LSB of normal result 


Comparison instructions: setgt, setlt, seteq, setlep, setzero 


setgt, setlt, seteq perform signed >, <, == comparison on two source operands and return the result as pdst. setlep 
returns 1 if srcl is in range (0, src2]. All comparisons are signed 16-bit. setzero returns 1 if both гс and ѕгс2 are 
equal to 0. 


Instructions:: setgt pdst, srcl, src2 OP=01000 setlt pdst, src1, src2 ОР-01001 seteq pdst, ѕгс1, src? OP=01010 setlep 
pdst, src1, src2 ОР-01011 setzero pdst, src1, src2 ОР-01111 [VP2 only] 


Opcode: base opcode, OP as above Operation: 


if (op == setgt) result = SEX(srcl) « SEX(src2); 

if (op == setlt) result = SEX(srcl) > SEX(src2); 

if (op == seteq) result = srcl зэ src2; 

if (op == setlep) result = SEX(srcl) <= SEX(src2) && SEX(srcl) >= 0; 
if (op == setzero) result = srcl == 0 && src2 == 0; 


pdst = result; 


Execution time: 1 cycle Predicate output: the comparison result 


Clamping and sign extension instructions: clamplep, clamps, sext 


clamplep clamps ѕгс1 to (0, src2] range. clamps, like the xtensa instruction of the same name, clamps src! to [-(1 
<< src2), (1 << src2) - 1] range, ie. to the set of (src2+1)-bit signed integers. sext, like the xteansa and falcon 
instructions of the same name, replaces bits src2 and up with a copy of bit src2, effectively doing a sign extension 
from a (src2+1)-bit signed number. 


Instructions:: clamplep pdst, dst, srcl, src2 ОР-01100 clamps pdst, dst, ѕгс1, src2 ОР-01101 sext pdst, dst, srcl, 
src2 ОР-01110 


Opcode: base opcode, OP as above Operation: 
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if (op == clamplep) { 
result = srcl; 
presult = 0; 
if (SEX(srcl) < 0) { 
presult = 1; 
result = 0; 


if (SEX(srcl) > SEX(src2)) 4 
presult = 1; 
result = src2; 


} 

if (ор == clamps) { 
bit = src2 8 Oxf; 
result = srcl; 
presult = 0; 


if (SEX(srcl) « -(1 << bit)) { 
result = -(1 << bit); 
presult = 1; 

} 

if (SEX(srcl) > (1 << bit) - 1) 4 
result = (1 << bit) - 1; 


presult = 1; 


} 
if (ор == sext) { 
bit = src2 & Oxf; 
presult = srcl >> bit & 1; 
if (presult) 
result = jrcl | -(1 << bit); 
else 
result = srcl & ((1 << bit) - 1); 
} 
dst = result; 
pdst = presult; 


Execution time: 1 cycle Predicate output: 


clamplep, clamps: 1 if clamping happened sext: 1 if result < 0 


Division by 2 instruction: div2s 


div2s divides a signed number by 2, rounding to 0. 
Instructions:: div2s pdst, dst, вгс1 ОР-01111 [VP3+ only] 


Opcode: base opcode, OP as above Operation: 


if (SEX(srcl) < 0) 1 


result = (SEX(srcl) + 1) >> 1; 
) else ( 
result = srci >> 1; 


} 
dst = result; 
pdst = result « 0; 


Execution time: 1 cycle Predicate output: 1 if result is negative 
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Bit manipulation instructions: bset, bclr, btest 


bset and bclr set or clear a single bit in a value. btest copies a selected bit to a $p register. 
Instructions:: bset pdst, dst, src1, src2 ОР-10000 bclr pdst, dst, src1, src2 OP=10001 Мезі pdst, src1, src2 OP=10010 


Opcode: base opcode, OP as above Operation: 


bit = src2 & Oxf; 
if (op == bset) { 
result = srcl | 1 << bit; 
presult = result & 1; 
dst = result; 
} 
if (ор == bclr) { 
dst = result - srcl & -(1 << bit) 
presult = result & 1; 
dst - result; 
} 
if (ор == btest) ( 
presult = srcl >> bit 6 1; 
} 
pdst = presult; 


Execution time: 1 cycle Predicate output: 


bset, bclr: bit 0 of the result btest: the selected bit 


Swapping reg halves: hswap 


hswap, like the falcon instruction of the same name, rotates a value by half its size, which is always 8 bits for vuc. 
Instructions:: hswap pdst, dst, 5гс1 ОР-10100 


Opcode: base opcode, OP as above Operation: 


result = srcl >> 8 | srel << 8; 
dst = result; 
pdst = result & 1; 


Execution time: 1 cycle Predicate output: bit 0 of the result 


Shift instructions: shl, shr, sar 


shl does a left shift, shr does a logical right shift, sar does an arithmetic right shift. 


Instructions:: shl pdst, dst, srcl, src? ОР-10101 shr pdst, dst, srcl, src2 OP=10110 sar pdst, dst, srcl, src2 
OPz10111 


Opcode: base opcode, OP as above Operation: 


shift = src2 & Oxf; 
if (op == shl) { 
result = srcl << shift; 
presult = result >> 16 & 1; 
} 
if (ор == shr) { 


(continues on next page) 
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(continued from previous page) 


result = srcl >> shift; 
if (shift != 0) { 

presult = presult = srcl >> (shift - 1) & 1; 
} else { 


presult = 0; 


} 


if (op == sar) { 
result = SEX(srcl) >> shift; 
if (shift != 0) { 
presult = presult = srcl >> (shift - 1) & 1; 
) else { 
presult = 0; 


} 
dst = result; 
pdst = presult; 


Execution time: 1 cycle Predicate output: the last bit shifted out 


Bitwise instructions: and, or, xor, not 


No comment. 


Instructions:: and pdst, dst, srcl, src2 ОР-11000 or pdst, dst, srcl, src2 OP=11001 xor pdst, dst, srcl, src2 


OP=11010 not pdst, dst, 8гс1 OP=11011 


Opcode: base opcode, OP as above Operation: 


if (op == and) result = srcl & src2; 
if (op == or) result = srcl | src2; 
if (op == xor) result = srcl ^ src2; 
if (op == not) result = -srcl; 


dst - result; 
pdst = result & 1; 


Execution time: 1 cycle Predicate output: bit 0 of the result 


Minmax instructions: min, max 


These instructions perform the signed min/max operations. 


Instructions:: min pdst, dst, ѕгс1, src2 ОР-11101 [VP3+ only] max pdst, dst, srcl, src2 OP=11110 [VP3+ only] 


Opcode: base opcode, OP as above Operation: 


if (op == min) which = (SEX(src2) < SEX(srcl)); 
if (op == max) which = (SEX(src2) >= SEX(srcl)); 
dst = (which ? src2 : srcl); 


pdst = which; 


Execution time: 1 cycle Predicate output: 0 if src1 is selected as the result, 1 if src2 is selected 
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Predicate instructions: and, or, xor 


These instruction perform the corresponding logical ops оп $p registers. Note that one of both inputs can be negates, 
as mentioned in psrcl/psrc2 operand description. 


Instructions:: and spdst, рѕгс1, psrc2 ОР-ххх00 or spdst, psrcl, psrc2 ОР=ххх01 xor spdst, psrcl, psrc2 ОР-ххх10 


Opcode: special opcode with OC=010, OP as above. Note that bits 2 and 3 of OP are used for psrcl and psrc2 
negation flags. 


Operation:: if (op == апа) spdst = рвгс1 & psrc2; if (op == or) spdst = рѕгс1 | psrc2; if (op == xor) spdst = рѕгс1 ^ 
psrc2; 


Execution time: 1 cycle 


No operation: nop 


Does nothing. 
Instructions:: nop ОР-ххх11 


Opcode: special opcode with ОС-010, ОР as above. Operation: 


/* nothing х/ 


Execution time: 1 cycle 


Long multiplication instructions: Imulu, Imuls 


These instructions perform signed and unsigned 16x11 -> 32 bit multiplication. srcl holds the 16-bit source, while 
low 11 bits of src2 hold the 11-bit source. The result is written to $lhi:$llo. 


Instructions:: Imulu ѕгс1, src2 OP=00000 Imuls src1, src2 ОР-00001 
Opcode: special opcode with ОС-101, OP as above Operation: 


if (op == umul) { 
result = srcl х (src2 & Ox7ff); 
if (op == smul) { 


/* sign extension from 11-bit number х/ 
52 = src2 8 OxT7ff; 
if (s2 & 0x400) 
52 -- 0x800; 
result = SEX(srcl) х 52; 


} 
Sllo = result; 
Slhi = result >> 16; 


Execution time: 3 cycles Execution unit conflicts: Imulu, Imuls, Isrr, ladd, 1ваг, Idivu 


Long arithmetic unary instructions: Isrr, ladd, Isar, Idivu 


These instruction operate on the 32-bit quantity in $lhi:$llo. ladd adds a signed 16-bit quantity to it. Isar shifts it right 
arithmetically by a given amount. Idivu does an unsigned 32/16 -> 32 division. Isrr divides it by 2^(src2 + 1), rounding 
to nearest with ties rounded up. 
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Instructions:: lsrr src2 ОР-00010 ladd src2 OP=00100 [VP3+ only] Isar src? ОР-01000 [VP3+ only] Idivu src2 
ОР=01 100 [УРА only] 


Opcode: special opcode with ОС=101, ОР as above Operation: 


val = 5ЕХ(5111) << 16 | $110; 
if (ор == lsrr) { 

bit = src2 & 0х1Ғ; 

val += 1 << bit; 

val >>- (bit + 1); 


} 


if (op == ladd) val += SEX(src2); 
if (op == lsar) val >>= src2 6 Oxlf; 
if (op == ldivu) 
val &= Oxffffffff; 
if (src2) 
val /= src2; 
else 
val = Oxffffffff; 


} 
5110 = val; 
5111 = val >> 16; 


Execution time: Isrr: 1 cycle ladd: 1 cycle Isar: 1 cycle Idivu: 34 cycles 


Execution unit conflicts: Imulu, Imuls, Isrr, ladd, 1ваг, Idivu 


Control flow instructions: bra, call, ret 


Todo: write me 


* Flow: 
0x00: [bra TARGET] 
bra IMM? 
Branch to address. Delay: 1 instruction 
0x02: [call TARGET] 
call IMM? 
XXX: stack and calling convention 
0x03: [ret] 
ret 


TODO: delay (blob: 1) XXX: stack and calling convention 


Memory access instructions: Id, st 


These instructions load and store values from/to one of the memory spaces available to the vuc microprocessor. The 
exact semantics of such operation depend on the space being accessed. 
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Instructions:: st space[dst + srcl * 2], src2 OP-xxxx0 [if IMMF is 0] st space[srcl + stoff], src2 OP=xxxx0 [if 
IMMF is 1] Id dst, space[src1 + Idoff] OP=xxxx1 [if IMMF is 0] ld dst, space[src1 + src2] OP=xxxx1 [if IMMF 
is 1] 


Opcode: Special opcode with ОС-100, OP as above. Note that btis 1-4 of OP are used to select memory space. 
Operation:: 

if (op == st) space. STORE(address, src2); 

else dst = space.LOAD(address); 


Execution time: 14: 3 cycles st: 1 cycle 


The scratch special registers 


The vuc has two 16-bit scratch registers that may be used for communication between vuc and the host [xtensa/falcon 
code counts at the host in this case]. One of them is for host -> vuc direction, the other for vuc -> host. 


The host -> vuc register is called $h2v. It’s RW on the host side, RO on vuc side. Writing this register causes bit 11 of 
$stat register to light up and stay up until $h2v is read on vuc side. 


$sr4/$h2v: host-»vpc 16-bit scratch register. Reading this register will clear bit 11 of $stat. This register is read-only. 
ВАКО 0x103290 / XLMI 0x0a400: H2V [VP2] BARO 0x085450 / I[0x11400]: H2V [VP3+] 

A read-write alias of $h2v. Does not clear $stat bit 11 when read. Writing sets bit 11 of $stat 
$stat bit 11: $h2v write pending. This bit is set when H2V is written by host, cleared when $h2v is read by vuc. 


The vuc -» host register is called $v2h. It's RW on the уис side, RO on host side. Writing this register causes an 
interrupt to be triggered. 


$sr5/$v2h: vuc->host 16-bit scratch register, read-write. Writing this register will trigger V2H vuc interrupt. 
BARO 0x103294 / XLMI 0х0а500: У2Н [VP2] BARO 0x085454 / I[0x11500]: V2H [VP3+] 
A read-only alias of $v2h. 


The $stat special register 


Every bit in this register performs a different function. АП of them can be read. For the ones that can be written, value 
0 serves as a noop, while value 1 triggers some operation. 


$sr6/$stat: Control and status register. 

VPRING. DEBLOCK buffer 0 write trigger [vdec/vuc/vpring.txt] 
ҮРКІМС DEBLOCK buffer 1 write trigger [vdec/vuc/vpring.txt] 
VPRING CTRL buffer 0 write trigger [vdec/vuc/vpring.txt] 
VPRING_CTRL buffer 1 write trigger [vdec/vuc/vpring.txt] 
XXX 
[XXX 
22? [XXX 
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bit 4: 299 [XXX] 
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bit 5: mvsread done status [vdec/vuc/mvsurf.txt] 


bit 6: MVSURF OUT full status [vdec/vuc/mvsurf.txt] 


bit 7: mvswrite busy status [vdec/vuc/mvsurf.txt] 
bit 8: 222 [XXX] 
bit 9: 222 [XXX] 


bit 10: macroblock input available [vdec/vuc/vreg.txt] 


bit 11: $h2v write pending [vdec/vuc/isa.txt] 


bit 12: watchdog triggered [vdec/vuc/isa.txt] 
bit 13 [VP4+?]: 222 [XXX] 
bit 14: user-controlled pulse PCOUNTER signal [vdec/vuc/perf.txt] 


bit 15: user-controlled continuousPCOUNTER signal [vdec/vuc/perf.txt] 


Three special instructions are available that read $stat implicitely. sleep instruction switches to a low-power sleep 
mode until bit 10 or bit 11 is set. wstc instruction does a busy-wait until a selected bit in $stat goes (00, wsts likewise 
waits until a selected bit goes to 1. 


Оп VP3+, a read-only alias of $stat is available in the MMIO space: 
BARO 0x0854bc / I[0x12f00]: STAT Aliases $stat vuc register, read only. 


Sleep instruction: sleep 


This instruction waits until a full macroblock has been read from the MBRING Пе. $stat bit 10 is set] or host writes 
$h2v register Пе. $stat bit 11 is set]. While this instruction is waiting, vuc microprocessor goes into a low power 
mode, and sends 0 on its “busy” signal, thus counting as idle. 


Instructions:: sleep ОР-00100 
Opcode: special opcode with ОС-001, OP as above Operation: 


while (!($stat & 0хс00)) idle(); 


Execution time: as long as necessary, at least 1 cycle, blocks subsequent instructions until finished 


Wait for status bit instructions: wstc, wsts 


These instructions wait for a given $stat bit to become 0 [wstc] or 1 [wsts]. Execution of all subsequent instructions is 
delayed until this happens. 


Instructions:: wstc imm4 OP=00101 wsts imm4 ОР-00110 
Opcode: special opcode with ОС-001, OP as above Operation: 


while (($stat >> imm4 & 1) != (ор == wsts)); 


Execution time: as long as necessary, at least 1 cycle, blocks subsequent instructions until finished 
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The watchdog counter 


Todo: write me 


Clear watchdog counter instruction: clicnt 


Todo: write me 


Misc special registers 


This section describes various special registers that don't fit anywhere else. 
$sr8/$pc: The program counter. When read, always returns the address of the instruction doing the read. 
BARO 0x10329c / XLMI 0x0a700: PC [VP2] BARO 0x08545c / I[0x11700]: PC [VP3+] 
A host-accessible alias of $pc. Shows the address of currently executing instruction. 
$sr12/$lhi: long arithmetic high word register $sr13/$llo: long arithmetic low word register 


These two registers together make a 32-bit quantity used in long arithmetic operations - see the documentation of long 
arithmetic instructions for details. These registers may be read after long arithmetic instructions to get their results. 
On VP3+, these registers may be written manually, on VP2 they're read-only and only modifiable by long arithmetic 
instructions. 


$sr14/Spred: predicate register file alias 


This register aliases the $p register file - bit X corresponds to $pX. The bits behave like the corresponding 5р registers 
- bit 15 is read-only and always 1, while bit 1 is read-only and is always the negation of bit 0. 


VP2/VP3/VP4 vuc MVSURF 


1. MVSURF format 

. MVSURF OUT setup 

. MVSURF IN setup 

. MVSO[] address space 

. MVSI[] address space 

. Writing MVSURF: mvswrite 
. Reading MVSURF: mvsread 


M OQ t A 9 N 


Introduction 


H.264, VC-1 and MPEG4 all support “direct” prediction mode where the forward and backward motion vectors for 
a macroblock are calculated from co-located motion vector from the reference picture and relative ordering of the 
pictures. To implement it in vuc, intermediate storage of motion vectors and some related data is required. This 
storage is called MVSURF. 
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A single MVSUREF object stores data for a single frame, or for two fields. Each macroblock takes 0x40 bytes in the 
MVSURE. The macroblocks in MVSURE are first grouped into macroblock pairs, just like in H.264 MBAFF frames. 
If the MVSURF corresponds to a single field, one macroblock of each pair is just left unused. The pairs are then stored 
in the MVSURE ordered first by X coordinate, then by Y coordinate, with no gaps. 


The vuc has two MVSURF access ports: MVSURF IN for reading the MVSURF of a reference picture [first picture 
in L1 list for H.264, the most recent I or P picture for VC-1 and MPEG4], MVSURF OUT for writing the MVSURF 
of the current picture. Usage of both ports is optional - if there's no reason to use one of them [MVSUREF IN ір non-B 
picture, or MVSURF OUT in non-reference picture], it can just be ignored. 


Both MVSURF IN and MVSURF OUT have to be set up via MMIO registers before use. To write data to MV- 
SURF. OUT, it first has to be stored by the vuc into MVSO[] memory space, then the mvswrite instruction executed 
[while making sure the previous mvswrite instruction, if any, has already completed]. Reading MVSURF IN is done 
by executing the mvsread instruction, waiting for its completion, then reading the MVSI[] memory space [or letting it 
be read implicitly by the vuc fixed-function hardware]. 


Note that MVSURF. OUT writes in units of macroblocks, while NVSURF IN reads in units of macroblock pairs - see 
details below. 


A single MVSURE entry, corresponding to a single macroblock, consists of: 
* for the whole macroblock: 


— frame/field flag [1 bit]: for H.264, 1 if mb field decoding flag set or in a field picture; for MPEGA, 1 if 
field-predicted macroblock 


— inter/intra flag [1 bit]: 1 for intra macroblocks 
* for each partition: 


— RPI [5 bits]: the persistent id of the reference picture used for this subpartition and the top/bottom field 
selector, if applicable - same as the $rpilO/$rpill value. 


* for each subpartition of each partition: 
- X component of motion vector [14 bits] 
— Y component of motion vector [12 bits] 


— zero flag [1 bit]: set if both components of motion vector are in -1..1 range and гейах [not RPI] is 0 - 
partial term used in H.264 colZeroFlag computation 


For H.264, the RPI and motion vector are from the partition's LO prediction if present, L1 otherwise. Since vuc was 
originally designed for H.264, a macroblock is always considered to be made of 4 partitions, which in turn are made 
of 4 subpartitions each - if macroblock is more coarsely subdivided, each piece of data is duplicated for all covered 
8x8 partitions and 4x4 subpartitions. Partitions and subpartitions are indexed in the same way as for $spidx. 


MVSURF format 


A single macroblock is represented by 0x10 32-bit LE words in MVSURF. Each word has the following format [i 
refers to word index, 0-15]: 


* bits 0-13, each word: X component of motion vector for subpartition i. 
* bits 14-25, each word: Y component of motion vector for subpartition i. 
* bits 26-30, word 0, 4, 8, 12: RPI for partition і>>2. 

* bit 26, word 1, 5, 9, 13: zero flag for subpartition i-1 

* bit 27, word 1, 5, 9, 13: zero flag for subpartition i 


* bit 28, word 1, 5, 9, 13: zero flag for subpartition 1+1 
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* bit 29, word 1, 5, 9, 13: zero flag for subpartition 1+2 
* bit 26, word 15: frame/field flag for the macroblock 


* bit 27, word 15: inter/intra flag for the macroblock 


MVSURF OUT setup 


The MVSURF OUT has three different output modes: 


* field picture output mode: each write writes one MVSURF macroblock and skips one MVSURF macroblock, 
each line is passed once 


* MBAFF frame picture output mode: each write writes one MVSURF macroblock, each line is passed once 


* non-MBAFF frame picture output mode: each write writes one MVSURF macroblock and skips one mac- 
roblock, each line is passed twice, with first pass writing even-numbered macroblocks, second pass writing 
odd-numbered macroblocks 


field: 0, 2, 4, 6, 8, 10 or 1; 3; 5, 7, 9, 11 


MBAFF frame: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 


non-MBAFF frame: 0, 2, 4, 1, 3, 5, 6, 8, 10, 7, 9, 11 


The following registers control MVSURF OUT behavior: 


ВАКО 0x1032f0 / XLMI 0x0bc00: MVSURF OUT OFFSET [VP2] The offset of MVSURF OUT from the start 
of the MEMIF MVSURE port. The offset is in bytes and has to be divisible by 0x40. 


BARO 0x085490 / I[0x12400]: MVSURF OUT ADDR [VP3+] The address of MVSURF. OUT in falcon port #2, 
shifted right by 8 bits. 


BARO 0x1032f4 / XLMI Ox0bd00: MVSURF OUT РАКМ [VP2] BARO 0x085494 / Ц0х12500|: MV- 
SURF OUT PARM [VP3+] 


bits 0-7: WIDTH, length of a single pass in writable macroblocks bit 8: MBAFF FRAME НАС, 1 if 
MBAFF frame picture mode enabled bit 9: FIELD. PIC FLAG, 1 if field picture mode enabled 


If neither bit 8 nor 9 is set, non-MBAFF frame picture mode is used. Bit 8 and bit 9 shouldn't be set at the same time. 


BARO 0x1032f8 / XLMI Ox0be00: MVSURF OUT LEFT [VP2] BARO 0х085498 / I[0x12600]: MV- 
SURF OUT LEFT [VP3+] 


bits 0-7: X, the number of writable macroblocks left in the current pass bits 8-15: Y, the number of passes 
left, including the current pass 


BARO 0x1032fc / XLMI Ox0bf00: MVSURF OUT POS [VP2] BARO 0x08549c / I[0x12700]: MV- 
SURF OUT POS [VP3+] 
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bits 0-12: MBADDR, the index of the current macroblock from the start of MVSURF. 
bit 13: PASS ODD, 1 if the current pass is odd-numbered pass 


АП of these registers are RW by the host. LEFT and POS registers are also modified by the hardware when it writes 
macroblocks. 


The whole write operation is divided into so-called "passes", which correspond to a line of macroblocks [field, non- 
MBAFF frame] or half a line [MBAFF frame]. When a macroblock is written to the MVSURF, it's written at the 
position indicated by POS. MBADDR, LEFT.X is decremented by 1, and POS. MBADDR is incremented by 1 [MBAFF 
frame] or 2 [field, non-MBAFF frame]. If this causes LEFT.X to drop to 0, a new pass is started, as follows: 


* LEFT.X is reset to PARM. WIDTH 
* LEFT.Y is decreased by 1 
* POS.PASS ODD is flipped 
* if non-MBAFF frame picture mode is in use: 
- if PASS ODD is 1, POS. MBADDR is decreased by PARM.WIDTH * 2 and bit 0 is set to 1 
- otherwise [PASS ODD is 0], POS. MBADDR bit 0 is set to 0 
When either LEFT.X or LEFT.Y is 0, writes to MVSURF OUT are ignored. 


The MVSURF OUT port has an output buffer of about 4 macroblocks - mvswrite will queue data into that buffer, and 
it'll auto-flush as MEMIF bandwidth allows. To determine whether the buffer is full Пе. if it's safe to queue any more 
data with mvswrite], use $stat bit 6: 


$stat bit 6: MVSURF OUT buffer full - no more space is available currently for writes, mvswrite instruction will be 
ignored and shouldn't be attempted until this bit drops to 0 [When МЕМІЕ accepts more data]. 


MVSUREF ІМ setup 


The MVSURF OUT has two input modes: 


* interlaced mode: used for field and MBAFF frame pictures, each read reads one macroblock pair, each line is 
passed once 


* progressive mode: used for non-MBAFF frame pictures, each read reads one macroblock pair, each line is 
passed twice 


interlaced: 061, 2&3, 4&5, 6&7, 8&9, 10611 


0 2 4 
progressive: 0&1, 2&3, 4&5, 041, 2&3, 4&5, 6&7, 8&9, 10&11, 6&7, 8&9,,, 
—10&11 
MN EBENE el dt 
1 3 5 
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The following registers control MVSURF IN behavior: 


BARO 0x1032e0 / XLMI 0х05800: MVSURF IN OFFSET [VP2] The offset of MVSURF IN from the start of 
the MEMIF MVSURE port. The offset is in bytes and has to be divisible by 0x40. 


ВАКО 0x085480 / I[0x12000]: MVSURF IN ADDR [VP3+] The address of MVSURF IN in falcon port #2, 
shifted right by 8 bits. 


BARO 0х1032е4 / XLMI 0x0b900: MVSURF IN PARM [VP2] BARO 0x085484 / I[0x12100]: MV- 
SURF IN. PARM [VP3+] 


bits 0-7: WIDTH, length of a single line in macroblock pairs bit 8: PROGRESSIVE, 1 if progressive 
mode enabled, 0 if interlaced mode 


enabled 


BARO 0x1032e8 / XLMI 0x0ba00: MVSURF IN. LEFT [VP2] BARO 0x085488 / 1[0x12200]: MVSURF IN. LEFT 
[VP3+] 


bits 0-7: X, the number of macroblock pairs left in the current line bits 8-15: Y, the number of lines left, 
including the current line 


BARO 0х1032ес / XLMI Ox0bb00: MVSURF IN POS [VP2] BARO 0x08548c / I[0x12300]: MVSURF IN POS 
[VP3+] 


bits 0-11: MBPADDR, the index of the current macroblock pair from the start of MVSURF. 
bit 12: PASS, 0 for first pass, 1 for second pass 


All of these registers are RW by the host. LEFT and POS registers are also modified by the hardware when it writes 
macroblocks. 


The read operation is divided into lines. In interlaced mode, each line is read once, in progressive mode each line 
is read twice. A single read of a line is called a pass. When a macroblock pair is read, it’s read from the position 
indicated by POS. MBPADDR, LEFT.X is decremented by 1, and POS.MBPADDR is incremented by 1. If this causes 
LEFT.X to drop to 0, a new line or a new pass over the same line is started: 


* LEFT.X is reset to PARM. WIDTH 
* if progressive mode is in use and POS.PASS is 0: 

— PASS is set to 1 

- POS.MBPADDR is decreased by PARM.WIDTH 
* otherwise [interlaced mode is in use or PASS is 1]: 

— PASS is set to O 

- LEFT.Y is decremented by 1 


When either LEFT.X or LEFT.Y is 0, reads from MVSURF IN will fail and won't affect MVSURF IN registers in 
any way. 

The MVSURF IN port has an input buffer of 2 macroblock pairs. It will attempt to fill this buffer as soon as it's 
possible to read a macroblock pair Пе. LEFT.X and LEFT.Y are non-0]. For this reason, LEFT must always be the 


last register to be written when setting up MVSURF IN. In addition, this makes it impossible to seamlessly switch to 
a new МУЅОКЕ IN buffer without reading the previous one until the end. 


Тһе MVSURF IN always operates on units of macroblock pairs. This means that the following special handling is 
necessary: 


* field pictures: use interlaced mode, execute mvsread for each processed macroblock 


* MBAFF frame pictures: use interlaced mode, execute mvsread for each processed macroblock pair [when start- 
ing to process the first macroblock in pair]. 
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* non-MBAFF frame pictures: use progressive mode, execute mvsread for each processed macroblock 


In all cases, Care must be taken to use the right macroblock from the pair in computations. 


MVSO[] address space 


MVSO[] is a write-only memory space consisting of 0x80 16-bit cells. Every address in range 0-0x7f corresponds to 
one cell. However, not all cells and not all bits of each cell are actually used. The usable cells are: 


e MVSO[i * 8 + 0], i in 0.15: X component of motion vector for subpartition i 
e MVSO[i * 8 + 1], i in 0.15: Y component of motion vector for subpartition i 


[ 
[ 
e MVSO[i * 0x20 +j * 8 + 2], iin 0..3, j in 0..3: RPI of partition i, | is ignored 
e MVSO[i * 8 + 3], i in 0.15: the “zero flag" for subpartition i 

[ 


e MVSO[i * 0x20 + 4], iin 0..15: macroblock flags, i is ignored: 
— bit 0: frame/field flag 
— bit 1: inter/intra flag 


e MVSO[i * 0x20 + 5], iin 0..15: macroblock partitioning schema, same format as $mbpart register, i is ignored 
[10 bits used] 


If the address of some datum has some ignored fields, writing to any two addresses with only the ignored fields 
differing will actually access the same data. 


MVSI[] address space 


MVSI[] is a read-only memory space consisting of 0x100 16-bit cells. Every address in range 0-Oxff corresponds to 
one cell. The cells are: 


e MVSI[mb * 0x80 + i * 8 + 0], i in 0..15: X component of motion vector for subpartition i [sign extended to 16 
bits] 


MVSI[mb * 0x80 + i * 8 + 1], i in 0..15: Y component of motion vector for subpartition i [sign extended to 16 
bits] 


MVSI[mb * 0x80 + i * 0x20 +j * 8 + 2], i in 0.3, j in 0..3: RPI of partition i, j is ignored 


MVSI[mb * 0x80 + i * 8 + 3], iin 0..15: the "zero flag" for subpartition i 


MVSI[mb * 0x80 +i * 8 +4 +j], iin 0..15, j in 0..3: macroblock flags, i and j are ignored: 
— bit 0: frame/field flag 
— bit 1: inter/intra flag 
mb is 0 for the top macroblock in pair, 1 for the bottom macroblock. 
If the address of some datum has some ignored fields, all addresses will alias and read the same datum. 


Note that, aside of explicit loads from MVSI[], the MVSI[] data is also implicitely accessed by some fixed-function 
vuc hardware to calculate MV predictors and other values. 
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Writing MVSURF: mvswrite 


Data is sent to MVSURF OUT via the mvswrite instruction. А single invocation of mvswrite writes a single mac- 
roblock. The data is gathered from MVSO[] space. mvswrite is aware of macroblock partitioning and will use the 
partitioning schema to gather data from the right cells of МУЗОП - for instance, if 16x8 macroblock partitioning is 
used, only subpartitions 0 and 8 are used, and their data is duplicated for all 8x8/4x4 blocks they cover. 


This instruction should not be used if MVSURF OUT output buffer is currently full - the code should execute a wstc 
instruction on $stat bit 6 beforehand. 


Note that this instruction takes 17 cycles to gather the data from MVSO[] space - in that time, MVSO[] contents 
shouldn't be modified. On cycles 1-16 of execution, $stat bit 7 will be lit up: 


$stat bit 7: mvswrite MVSO[] data gathering in progress - this bit is set at the end of cycle 1 of mvswrite execution, 
cleared at the end of cycle 17 of mvswrite execution, ie. when it's safe to modify MVSO[]. Note that this means that 
the instruction right after mvswrite will still read O in this bit - to wait for mvswrite completion, use mvswrite; nop; 
wstc 7 sequence. This bit going back to 0 doesn't mean that MVSURF write is complete - it merely means that data 
has been gathered and queued for a write through the MEMIF. 


If execution of this instruction causes the МУЅОВЕ OUT buffer to become full, bit 6 of $stat is set to 1 on the same 
cycle as bit 7. 


Instructions: mvswrite 


Opcode: special opcode, ОР-01010, ОРС-001 Operation: 


532 tmp[0x10] = (50); 

if (MVSURF OUT.no space left()) 
break; 

$stat[7] = 1; /ж cycle 1 х/ 

if (MVSURF OUT.full after next mb()) 
$stat[6] = 1; 

b2 partlut[4] = { 0, 2, 1, 3}; 


b10 mbpart = MVSO[5]; 
for (i = 0; i < 0x10; i++) { 
pidx = i >> 2; 
pmask = partlut[mbpart & 3]; 
spmask = pmask << 2 | partlut[mbpart >> (pidx х 2 + 2) & 3]; 
mpidx = pidx & pmask; 


mspidx = i & spmask; 
tmp[i] |= MVSO[mspidx х 8 + 0] | MVSO[mspidx х 8 + 1] << 14; 
tmp[(i & 0хс) | 1] |= MVSO[mspidx х 8 + 3] << (26 + (i & 3)); 


} 

for (i = 0; i < 4; 144) 4 
pidx = i >> 2; 
pmask = partlut[mbpart & 3]; 
mpidx = pidx & pmask; 


tmp[i ж 4] |= MVSO[mpidx х 0x20 + 2] << 26; 
} 
tmp[0xf] |= MVSO[4] << 26; 
$stat[7] = 0; /* cycle 17 »/ 


MVSURF, OUT.write (tmp); 


Execution time: 18 cycles [submission to MVSURF OUT port only, doesn't include the time needed by MV- 
SURF OUT to actually flush the data to memory] 


Execution unit conflicts: mvswrite 
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Reading MVSURF: mvsread 


Data is read from MVSURF IN via the mvsread instruction. A single invocation of mvsread reads a single macroblock 


pair. The data is storred into MVSI[] space. 


Since MVSURF resides in VRAM, which doesn't have a deterministic access time, this instruction may take an 
arbitrarily long time to complete the read. The read is done asynchronously and а $stat bit is provided to let the 


microcode know when it's finished: 


$stat bit 5: mvsread MVSI[] write done - this bit is cleared on cycle 1 of mvsread execution and set by the mvsread 
instruction once data for a complete macroblock pair has been read and stored into MVSI[]. Note that this means that 
the instruction right after mvsread may stil read 1 in this bit - to wait for mvsread completion, use mvsread ; nop ; wsts 
5 sequence. Also note that if the read fails because one of MVSURF IN LEFT fields is 0, this bit will never become 
1. Also, note that the initial state of this bit after vuc reset is 0, even though no mvsread execution is in progress. 


Instructions: mvsread 


Opcode: special opcode, ОР-01001, ОРС-001 Operation: 


532 tmp[2] [0x10]; 
Sstat[5] = 0; /* cycle 1 »/ 
MVSURF_IN.read(tmp); /* arbitrarily long »/ 
for (mb = 0; mb < 2; mb+ł+) { 
for (i = 0; i < 0x10; i++ 
MVSI[mb * 0x80 + 


+ + ж ж FH 


o 

ж 

со 

о 

+ + 
H- H- H- H- H- > 
со со со со со 
t+ + 4+ 4 
њо № һо 


} 
$stat[5] = 1; 


= SEX 
EX 


(tmp [mb] [i] [0:13]); 
SEX (tmp [mb] [i] [14:25]); 
tmp [mb] [i&0xc] [26:30]; 
[ 
[ 


tmp[mb][(i&Oxc) | 1] [26 + (i 8 3)]; 


= tmp[mb] [15] [26:27]; 


Execution time: >= 37 cycles Execution unit conflicts: mvsread 


VP2/VP3/VP4 vuc video registers 


2. The video MMIO registers 

3. $parm register 

4. The RPIs and rpitab 

5. Macroblock input: mbiread, mbinext 


6. Table lookup instruction: lut 


Introduction 


Todo: write me 


The video special registers 
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Todo: the following information may only be valid for H.264 mode for now 


* $sr0: ??? controls $sr48-$sr58 (bits 0-6 when set separately) [XXX] [VP2 only] 
e $sr1: ??? similar to $510 (bits 0-4, probably more) [XXX] [VP2 only] 


* $sr2/$spidx: partition and subpartition index, used to select which [sub]partitions some other special registers 
access: 


— bits 0-1: subpartition index 
— bits 2-3: partition index 


Note that, for indexing purposes, each partition index is considered to refer to an 8x8 block, and each subpartition 
index to 4x4 block. If partition/subpartition size is bigger than that, the indices will alias. Thus, for 16x8 
partitioning mode, $spidx equal to 0-7 wil select the top partition, $spidx equal to 8-15 will select the bottom 
one. For 8x16 partitioning, $spidx equal to 0-3 and 8-11 will select the left partition, 4-7 and 12-15 will select 
the right partition. 


• $sr3: ??? bits 0-4 affect $sr32-$sr42 [XXX] [VP2 only] 

• $sr3: 22? [XXX] [VP3+ only] 

• $sr4/$h2v: a scratch register to pass data from host to vuc [see vdec/vuc/intro.txt] 

* $sr5/$v2h: a scratch register to pass data from vuc to host [see vdec/vuc/intro.txt] 

* $sr6/$stat: some sort of control/status reg, writing 0x8000 alternates values between 0x8103 and 0 [XXX] 


— bit 10: macroblock input available - set whenever there's a complete macroblock available from MBRING, 
cleared when mbinext instruction skips past the last currently available macroblock. Will break out of sleep 
instruction when set. 


- bit 11: $h2v modified - set when host writes H2V, cleared when $h2v is read by vuc, will break out of 
sleep instruction when set. 


— bit 12: watchdog hit - set 1 cycle after $icnt reaches WDCNT and it's not equal to Oxffff, cleared when 
Ясш or WDCNT is modified in any way. 


* $sr7/$parm: sequence/picture/slice parameters required by vuc hardware [see vdec/vuc/intro.txt] 
* $sr9/$cspos: call stack position, 0-8. Equal to the number of used entries on the stack. 


* $sr10/$cstop: call stack top. Writing to this register causes the written value to be pushed onto the stack, reading 
this register pops a value off the stack and returns it. 


* $sr11/$rpitab: D[] address of refidx -> dpb index translation table [VP2 only] 
• $sr15/$icnt: instruction/cycle counter (2: check пор, effect of delays) 

* $sr16/$mvxl0: sign-extended mvd_l0[$spidx][0 
• $sr17/$mvyl0: sign-extended mvd_lO[$spidx][1 
* $sr18/$mvxll: sign-extended mvd_l1[$spidx][0 
• $sr19/$mvyll: sign-extended туа 11[$spidx][1 
* $sr20/$refl0: ref_idx_lO[$spidx>>2] [input] 

* $sr21/$refll: ref_idx_l1[$spidx>>2] [input] 

* $sr22/$rpil0: dpb index of LO reference picture for $spidx-selected partition 


* $sr23/$rpill: dpb index of L1 reference picture for $spidx-selected partition 
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* $sr24/$mbflags: 
— bit 0: mb field decoding flag [RW] 
— bit 1: is intra macroblock [RO] 

bit 2: is 1 МхМ macroblock [RO] 


bit 3: transform size 8x8 flag [RW] 
- bit 4: 2922 [XXX] 
bit 5: is I, 16x16 macroblock [RO] 


bit 6: partition selected by $spidx uses LO or Bi prediction [RO] 


— bit 7: partition selected by $spidx uses L1 or Bi prediction [RO] 
bit 8: mb field decoding. flag for next macroblock [only valid if $sr6 bit 10 is set] [RO] 


bit 9: mb skip flag for next macroblock [only valid if $sr6 bit 10 is set] [RO] 


— bit 10: partition selected by $spidx uses Direct prediction [RO] 


bit 11: any partition of macroblock uses Direct prediction [RO] 
bit 12: is I PCM macroblock [RO] 
— bit 13: is Р SKIP macroblock [RO] 
* $sr25/$qpy: 
— bits 0-5: mb qp delta [input] / QPy [output] [H.264] 
— bits 0-5: quantiser scale code [input and output] [MPEGI/MPEG2] 


— bits 8-11: intra chroma, pred mode, values: 


* 0: DC [input], DC ??? [output] [XXX] 


Ó 


ж 1: horizontal [input, output] 


* 2: vertical [input, output] 
: plane [input, output] 
: DC 227 [output] 
: DC 227 [output] 
: DC 227 [output] 
: С 22? [output] 
: DC ??? [output] 
: DC_??? [output] 
ж Оха: DC 222 [output] 

* $sr26/Sqpc: 

— bits 0-5: QPc for Cb [output] [H.264] 


— bits 8-13: QPc for Cr [output] [H.264] 


* 
2 бо м nun 9 N 


* 


* $sr27/$mbpart: - bits 0-1: macroblock partitioning type 
- 0: 16x16 
- 1: 16x8 
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- 2: 8x16 


— bits 2-3: partition 0 subpartitioning type 
— bits 4-5: partition 0 subpartitioning type 
— bits 6-7: partition 0 subpartitioning type 


— bits 8-9: partition 0 subpartitioning type 


3: 8x8 


* 0: 8x8 

ж 1: 8x4 

* 2: 4x8 

ж 3: 4x4 
* $sr28/$mbxy: 


— bits 0-7: macroblock Y position 


— bits 8-15: macroblock X position 


* $sr29/$mbaddr: 


— bits 0-12: macroblock address 
— bit 15: first macroblock in slice flag 

* $sr30/$mbtype: macroblock type, for H.264: 
— 0x00: 
- 0x01: 
— 0x02: 
— 0x03: 
— 0x04: 
— 0x05: 
— 0x06: 
— 0х07: 
— 0x08: 
— 0x09: 
- Ox0a: 
— OxOb: 
- ОхОс: 
- 0х04: 
- Ox0e: 
- ОхОЁ 
- 0х10: 
- 0х11: 
- 0х12: 


I NxN 

I 16x16 000 
I 16x16 1 00 
I 16x16 200 
I 16x16 3 00 
I 16x160 1 0 
I 16x16 1 10 
I 16x162 1 0 
I 16x163 1 0 
I 16x160 20 
I 16x16 12 0 
I 16x162 20 
I 16x16 3 2 0 
I 16Х16 0 01 
I 16х16 10 1 
I 16x1620 1 
I 16x1630 1 
I 16x160 1 1 
I 16x16 1 1 1 


2.11. Video decoding, encoding, and processing 


455 


nVidia Hardware Documentation, Release git 


0х13 


0х14: 
0х 15: 
0х 16: 
0х17: 
0x18: 
0x19: 
0x20: 
0x21: 
0x22: 
0x23: 
0x24: 
0x40: 
0x41: 
0x42: 
0x43: 
0x44: 
0x45: 
0x46: 
0x47: 
0x48: 
0x49: 
Ox4a: 
Ox4b: 
Ox4c: 
Ox4d: 
Ox4e: 
Ox4f: 
0x50: 
0х51: 
0х52: 
0х53: 
0х54: 
0х55: 
0х56: 


Ox7e 


:L 16x16 2 11 
I 16x16 3 1 1 
I 16x1602 1 
I 16x16 12 1 
I 16x1622 1 
I 16х16 3 21 
I PCM 

P LO 16x16 

P LO LO 16x8 
P LO LO 8x16 
P 8x8 

Р 8x8refÜ 

B Direct 16x16 
B LO 16x16 

B L1 16x16 

B Bi 16x16 

B LO LO 16x8 
B LO LO 8x16 
B L1 Ll 16x8 
B L1 L1 8x16 
B LO L1 16x8 
B LO L1 8x16 
B L1 LO 16x8 
B L1 LO 8x16 
B LO Bi 16x8 

B 1.0 Bi 8x16 

B L1 Bi 16x8 

B L1 Bi 8x16 

B Bi LO 16x8 

B Bi LO 8x16 

B Bi L1 16x8 

B Bi L1 8x16 

B Bi Bi 16x8 

B Bi Bi 8x16 

В 8x8 

: B. SKIP 
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- Ox7f: P SKIP 
* $sr31/$submbtype: [VP2 only] 
— bits 0-3: sub mb type[0] 


bits 4-7: sub mb type[1] 
— bits 8-11: sub. mb type[2] 
— bits 12-15: sub mb type[3] 
• $sr31: 222 [XXX] [VP3+ only] 
* $sr32-$sr40: ??? affected by $sr3, unko21, read only [XXX] 
* $sr41-$sr42: 227 affected Бу $sr3, unko21, read only [XXX] [VP2 only] 
* $sr48-$sr58: 227 affected by writing $srO and $sr1, unko22, read only [XXX] 


Table lookup instruction: lut 


Performs a lookup of src1 in the lookup table selected by low 4 bits of src2. The tables are codec-specific and generated 
by hardware from the current contents of the video special registers. 


Todo: recheck this instruction on VP3 and other codecs 


Tables 0-3 are an alternate way of accessing H.264 inter prediction registers [$sr16-$sr23]. The table index is 1-bit. 
Index 0 selects the 10 register, index 1 selects the 11 register. Table 0 is $mvxl* registers, 1 is $mvyl*, 2 15 Фей“, 3 is 
$rpil*. 


Tables 4-7 behave like tables 0-3, except the lookup returns 0 if $mbtype is equal to Ox7f [P SKIP]. 


Table 8, known as pent, is used to look up partition and subpartition counts. The index is 3-bit. Indices 0-3 return the 
subpartition count of corresponding partition, while indices 4-7 return the partition count of the macroblock. 


Tables 9 and 10 are indexed in a special manner: the index selects a partition and a subpartition. Bits 0-7 of the index 
are partition index, bits 8-15 of the index are subpartition index. The partition and subpartition indices bahave as in 
the H.264 spec: valid indices are 0, 0-1, or 0-3 depending on the partitioning/subpartitioning mode. 


Table 9, known as spidx, translates indices of the form given above into $spidx values. If both partition and subpartition 
index are valid for the current partitioning and subpartitioning mode, the value returned is the value that has to be poked 
into $spidx to access the selected [sub]partition. Otherwise, junk may be returned. 


Table 10, known as pnext, advances the partition/subpartition index to the next valid subpartition or partition. The 
returned value is an index in the same format as the input index. Additionally, the predicate output is set if the 
partition index was not incremented [transition to the next subpartition of a partition], cleared if the partition index 
was incremented [transition to the first subpartition of the next partition]. 


Table 11, known as pmode, returns the inter prediction mode for a given partition. The index is 2-bit and selects the 
partition. If index is less then pcnt[4] and $mbtype is inter-predicted, returns inter prediction mode, otherwise returns 
0. The prediction modes are: 


* 0 direct 
e 1 LO 
* 2LI 
* 3Bi 
Tables 12-15 are unused and always return 0. [XX X: 12 used for VC-1 on VP3] 
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Instructions: lut pdst, dst, srcl, src2 ОР-11100 


Opcode: base opcode, OP as above Operation: 


/* helper functions х/ 
int pont() { 
switch (Smbtype) 1 
case 0: /ж I NxN x/ 
case 0x19: /х I PCM «*/ 
return 4; 
case 1..0x18: /* I 16x16 х х/ 
return 1; 
case 0x20: /x P LO 16x16 »/ 
return 1; 
case 0x21: /х P LO LO 16x8 x/ 
case 0x22: /х P LO LO 8x16 «/ 
return 2; 
case 0x23: /х P 8x8 x/ 
case 0x24: /х P 8x8ref0 »/ 
return 4; 


case 0x40: /х B Direct 16x16 «/ 
case 0x41: /x B LO 16x16 »/ 
case 0x42: /x B L1 16x16 x/ 
case 0x43: /х B Bi 16x16 »*/ 
return 1; 
case 0x44: /x B LO LO 16x8 x/ 
case 0x45: /х B LO LO 8x16 x/ 
case 0x46: /х B L1 L1 16x8 x/ 
case 0x47: /х B L1 L1 8x16 «/ 
case 0x48: /х B LO L1 16x8 x/ 
case 0x49: /x B LO L1 8x16 x/ 
case 0х4а: /x B L1 LO 16x8 x/ 
case Ox4b: /х B L1 LO 8x16 x/ 
case 0х4с: /x B LO Bi 16x8 x/ 
case 0х44: /х B LO Bi 8x16 x/ 
case Ох4е: /х B L1 Bi 16x8 x/ 
case 0х4Ё: /х B L1 Bi 8x16 «/ 
case 0x50: /х B Bi LO 16x8 x/ 
case 0x51: /х B Bi LO 8x16 x/ 
case 0x52: /х B Bi L1 16x8 х/ 
case 0x53: /х B Bi L1 8x16 х/ 
case 0x54: /х B Bi Bi 16x8 x/ 
B Bi Bi 8x16 «/ 


case 0x55: /» 
return 2; 
case 0x56: /х B 8x8 x/ 
return 4; 
case 0х7е: /х B SKIP »/ 
return 4; 
case 0х7Ё: /х P SKIP »/ 
return 1; 
/* in other cases returns junk ж/ 


} 
int spont(int idx) { 
if (pent() < 4) { 
return 1; 
) else if ($mbtype -- || Smbtype == 0x19) { /х I ММ or I PCM »/ 
return ($mbflags[3:3] ? 1: 4); /х transform size 8x8 flag x/ 


(continues on next page) 
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(continued from previous page) 


) else { 


smt = Ssubmbtype >> (idx х 4)) & Oxf; 


/ж XXX »/ 


} 
int mbpartmode_16x8() { 
switch (Smbtype) { 
case 0x21: /« P LO LO 16x8 х/ 
case 0x44: /« B LO LO 16x8 x/ 
case 0x46: /* B L1 L1 16x8 x/ 
case 0x48: /х B LO L1 16x8 х/ 
case 0х4а: /« B L1 LO 16x8 «/ 
case 0х4с: /ж B LO Bi 16x8 «/ 
case 0х4е: /х B L1 Bi 16x8 х/ 
case 0x50: /ж B Bi LO 16х8 «/ 
case 0x52: /* B Bi L1 16x8 x/ 
case 0x54: /* B Bi Ві 16x8 х/ 
return 1; 
default: 
return 0; 


int submbpartmode 8x4(int idx) { 
smt = Ssubmbtype >> (idx х 4) & Oxf; 
switch(submbtype) { 
/* XXX «*/ 


} 
int mbpartpredmode(int idx) 4 
/ж XXX x/ 
} 
/* end of helper functions */ 
table = src2 & Oxf; 
if (table « 8) { 
which = srcl 6 1; 
switch (table & 3) { 


case 0: result = (which ? 5шүх11 
case 1: result = (which ? 5шму11 
case 2: result = (which ? Srefll 
case 3: result = (which ? бүр111 
} 
if ((table & 4) && Smbtype == 0х7Ё) 
result = 0; 
presult = result & 1; 
| else if (table == 8) { /* pent x/ 
idx = srcl & 7; 
if (idx « 4) { 
result = spcnt (idx); 
} else { 
result = pcnt(); 
} 
} else if (table == || table == 10) { 
pidx = srcl & 7; 
sidx = srcl >> 8 6 3; 
if (table == 9) { /* spidx */ 
if (mbpartmode_16x8 () ) 
resp = (pidx & 1) << 1; 


5шух10) 
$mvyl0); break; 
$ref10) 
5гр110) 


; break; 


; break; 
; break; 


(continues on next page) 
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else 
resp - (pidx & 3); 
if (submbpartmode 8x4(resp »» 2)) 
ress = (sidx & 1) << 1; 
else 
ress = (sidx & 3); 
result = resp << 2 | ress; 
presult = result & 1; 
) else { /* pnext x/ 
if (pidx « 4) ( 
с = spcnt (idx); 
) else 1 


с = pont (); 
} 
ress = sidx + 1; 
if (ress >= c) 4 


resp (pidx & 3) * 1; 
ress = 0; 
) else 1 


result = ress << 8 | resp; 
presult = ress != 0; 
} 
) else if (table -- 10) 4 /* pmode ж/ 


result - mbpartpredmode(srcl & 3); 
presult = result & 1; 
) else ( 
result - 0; 
presult = 0; 
} 
dst = result; 
pdst = presult; 


Execution time: 1 cycle Predicate output: 


Tables 0-9 and 11-15: bit 0 of the result Table 10: 1 if transition to next subpartition in a partition, O if 


transition to next partition 


VP2 унс output 


Contents 


* VP2 уис output 


— Introduction 


Introduction 


Todo: write me 
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vuc performance monitoring signals 


Contents 


* vuc performance monitoring signals 


— Introduction 


Introduction 


Todo: write me 


2.11.3 VP2 video decoding 


Contents: 


VP2 xtensa processors 


Todo: write me 


Configured options: 


Code Density Option 


Loop Option 


16-bit Integer Multiply Option 


Miscellaneous Operations Option: - InstructionCLAMPS = 0 - InstructionMINMAX = 1 - InstructionNSA = 0 
- InstructionSEXT = 0 


Boolean Option 


Exception Option - NDEPC = 1 - ResetVector = 0xc0000020 - UserException Vector = 0xc0000420 - KernelEx- 
ceptionVector = 0xc0000600 - DoubleExceptionVector = 0хс0000а00 


Interrupt Option - NINTERRUPT = 10 - INTTYPE[0]: Timer - INTTYPE[1]: Timer - INTTYPE[2]: Level - 
INTTYPE[3]: XXX Level/Edge/WriteErr - INTTYPE[4]: NMI - INTTYPE[5]: Level - INTTYPE[6]: Level - 
INTTYPE[7]: Level - INTTYPE[8]: Level - INTTYPE[9]: Level 


High-priority Interrupt Option - NLEVEL: 6 - LEVEL[0]: 1 - LEVEL[1]: 1 - LEVEL[2]: 2 - LEVEL[3]: 3 
- LEVEL[4]: 7 - LEVEL[5]: 4 - LEVEL[6]: 5 - LEVEL[7]: 5 - LEVEL[8]: 5 - LEVEL[9]: 5 - EXCM- 
LEVEL: 1 - NNMI: 1 - InterruptVector[2] = 0xc0000b40 - InterruptVector[3] = 0xc0000c00 - InterruptVec- 
tor[4] = 0xc0000d20 - InterruptVector[5] = 0xc0000e00 - InterruptVector[6] = 0xc0000f00 - Interrupt Vector[7] 
= 0xc0001000 


Timer Interrupt Option - NCOMPARE = 2 - TIMERINT[0]: 0 - TIMERINTT[1]: 1 


Instruction Cache Option - InstCacheWayCount: 3 - InstCacheLineBytes: 0x20 - InstCacheBytes: 0x3000 


Instruction Cache Test Option 


Instruction Cache Index Lock Option 
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Data Cache Option - DataCacheWayCount: 2 - DataCacheLineBytes: 0x20 - DataCacheBytes: 0х1000 - 
IsWriteback: Yes 


Data Cache Test Option 


Data Cache Index Lock Option 
XLMI Option - XLMIBytes = 256kB - XLMIPAddr = Oxcffc0000 


Region Protection Option 


Windowed Register Option - WindowOverflow4 = 0xc0000800 - WindowUnderflow4 = 0xc0000840 - Win- 
dowOverflow8 = 0xc0000880 - WindowUnderflow8 = 0xc00008c0 - WindowOverflow12 = 0xc0000900 - Win- 
dowUnderflow12 = 0xc0000940 - NAREG - 32 


Processor Interface Option 
Debug Option - DEBUGLEVEL = 6 - NIBREAK = 2 - NDBREAK = 2 - SZICOUNT = 32 - OCD: XXX 
Trace Port Option? [XXX] 


VLD: variable length decoding 


Contents 


* VLD: variable length decoding 

— Introduction 

— The registers 

— Reset 

— Parameter and position registers 

— Internal state for context selection 

- Interrupts 

- Stream input 

- MBRING output 

— Command and status registers 

ж Command 0: GET UE 

Command 1: GET SE 
Command 2: GETBITS 
Command 3: NEXT START CODE 
Command 4: CABAC START 
Command 5: MORE RBSP DATA 
Command 6: MB SKIP FLAG 
Command 7: END OF. SLICE FLAG 
Command 8: CABAC INIT CTX 
Command 9: MACROBLOCK SKIP MBFDF 


* 


* 


* 


* 


* 


* 


* 


* 


* 
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* Command Оха: MACROBLOCK LAYER MBFDF 
ж Command Охб: PRED WEIGHT. TABLE 


* Command Охс: SLICE DATA 


Introduction 


The VLD is the first stage of the VP2 decoding pipeline. It is part of PBSP and deals with decoding the H.264 bitstream 
into syntax elements. 


The input to the VLD is the raw H.264 bitstream. The output of VLD is MBRING, a ring buffer structure storing the 
decoded syntax elements in the form of word-aligned packets. 


The VLD only deals with parsing the NALs containing the slice data - the remaining NAL types are supposed to be 
parsed by the host. Further, the hardware can only parse pred weight table and slice data elements efficiently - the 
remaining parts of the slice МАГ, are supposed to be parsed by the firmware controlling the VLD in a semi-manual 
manner: the VLD provides commands that parse single syntax elements. 


The following H.264 profiles are supported: 
* Constrained Baseline 
* Baseline [only in single-macroblock mode if FMO used - see below] 
* Main 
* Progressive High 
* High 
* Multiview High 
* Stereo High 
The limitations are: 
* max picture width and height: 128 macroblocks 


* max macroblocks in picture: 8192 


Todo: width/height max may be 255? 


There are two modes of operation that VLD can be used with: single-macroblock mode and whole-slice mode. In the 
single-macroblock mode, parsing for each macroblock has to be manually triggered by the firmware. In whole-slice 
mode, the firmware triggers processing of a whole slice, and the hardware automatically iterates over all macroblocks 
in the slice. However, whole-slice mode doesn't support Flexible Macroblock Ordering aka. slice groups. Thus, 
single-macroblock mode has to be used for sequences with non-zero value of num slice groups minusl. 


The VLD keeps extensive hidden internal state, including: 
* pred weight table data, to be prepended to the next emitted macroblock 
* bitstream position, zero byte count [for escaping], and lookahead buffer 
* CABAC valMPS, pStateIdx, codIOFfset, codIRange state 
* previously decoded parts of macroblock data, used for CABAC and CAVLC context selection algorithms 


* already queued but not yet written MBRING output data 
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The registers 


The VLD registers are located in PBSP XLMI space at addresses 0x00000:0x08000 [BARO addresses 


0x103000:0x103200]. They are: 


XLMI MMIO Name Description 
0x00000 0x103000 РАКМ 0 parameters from sequence/picture parameter structs and the 
slice header 
0x00100 0x103004 PARM 1 parameters from sequence/picture parameter structs and the 
slice header 
0x00200 0x103008 MB. POS position of the current macroblock 
0x00300 0x10300c COM- writing executes a VLD command 
MAND 
0x00400 0x103010 STATUS shows busy status of various parts of the VLD 
0x00500 0x103014 RESULT result of a command 
0x00700 0x10301c INTR. EN interrupt enable mask 
0x00800 0x103020 ??? ??? 
0х00900 0х103024 INTR interrupt status 
0х00а00 0х 103028 RESET resets the VLD and its registers to initial state 
0х01000+1*0х 100 0х103040-4%4 CONF[0:8] | length and enable bit of stream buffer i 
0х01100+1*0х100 0х103044+1*4 OFF- offset of stream buffer i 
SET[0:8] 
0x02000 0x 103080 BITPOS the bit position in the stream 
0x04000 0x 103100 OFFSET the MBRING offset 
0x04100 0x 103104 HALT_POS | the MBRING halt position 
0x04200 0x 103108 WRITE POS the MBRING write position 
0x04300 0x10310c SIZE the MBRING size 
0x04400 0x103110 TRIGGER writing executes MBRING commands 


Todo: reg 0x00800 


Reset 


The engine may be reset at any time by poking the RESET register. 


ВАКО 0х103028 / XLMI 0x00a00: RESET Any write will cause the VLD to be reset. All internal state is reset to 
default values. All user-writable registers are set to 0, except UNK8 which is set to Oxffffffff. 


Parameter and position registers 


The values of these registers are used by some of the VLD commands. PARM registers should be initialised with 
values derived from sequence parameters, picture parameters, and slice header. MB_POS should be set to the address 
of currently processed macroblock [for single-macroblock operation] or the first macroblock of the slice [for whole- 
slice operation]. In whole-slice operation, MB_POS is updated by the hardware to the position of the last macroblock 
in the parsed slice. 


For details on use of this information by specific commands, see their documentation. 
ВАКО 0х103000 / XLMI 0х00000: РАКМ 0 


* bit 0: entropy. coding mode flag - set as in picture parameters 
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bits 1-8: width mbs - set to ріс width in mbs minusl + 1 


bit 9: mbaff frame flag - set to (mb adaptive frame field flag && !field pic flag) 


bits 10-11: picture structure - one of: 
— 0: frame - set if !field pic flag 
— 1: top field - set if field pic flag && !bottom, field flag 
— 2: bottom field - set if field pic flag && bottom field flag 
bits 12-16: nal unit type - set as in the slice NAL header [XXX: check] 


bit 17: constrained intra pred - set as in picture parameters [XXX: check] 


bits 18-19: cabac init idc - set as in slice header, for P and B slices 


bits 20-21: chroma format idc - if parsing auxiliary coded picture, set to 0, otherwise set as in sequence 
parameters 


bit 22: direct. 8x8 inference flag - set as in sequence parameters 


bit 23: transform, 8x8. mode flag - set as in picture parameters 
ВАКО 0x103004 / XLMI 0x00100: PARM 1 
* bits O-1: slice type - set as in slice header 


* bits 2-14: slice tag - used to tag macroblocks in internal state with their slices, for determining availability 
status in CABAC/CAVLC context selection algorithms. See command description. 


• bits 15-19: num ref idx 10 active minus] - set as in slice header, for P and B slices 


• bits 20-24: num ref idx 11 active minusl - set as in slice header, for В slices 


* bits 25-30: sliceqpy - set to (pic init qp minus26 + 26 + slice qp delta) 
ВАКО 0х103008 / XLMI 0x00200: MB POS 

* bits 0-12: addr - address of the macroblock 

* bits 13-20: x - x coordinate of the macroblock in macroblock units 

* bits 21-28: y - y coordinate of the macroblock in macroblock units 


* bit 29: first - 1 if the described macroblock is the first macroblock in its slice, О otherwise 


Internal state for context selection 


Both CAVLC and CABAC sometimes use decoded data of previous macroblocks in the slice to determine the decoding 
algorithm for syntax elements of the current macroblock. The VLD thus stores this data in its internal hidden memory. 


Todo: what macroblocks are stored, indexing, tagging, reset state 


For each macroblock, the following data is stored: 
* slice tag 
* mb field decoding flag 
* mb skip flag 
* mb type 
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coded block, pattern 


transform size 8x8 flag 


intra chroma pred mode 

ref idx IX[i] 

туа ІХ] 

coded block, flag for each block 


total coeffs for each luma 4x4 / luma AC block 


Todo: and availability status? 


Additionally, the following data of the previous decoded macroblock [not indexed by macroblock address] is stored: 


* mb qp delta 


Interrupts 


Todo: write me 


ВАКО 0x10301c / XLMI 0x00700: INTR. EN 
* bit 0: UNK INPUT 1 
* bit 1: END OF STREAM 
e bit 2: ОМК INPUT 3 
* bit3: MBRING HALT 
• bit 4: SLICE РАТА DONE 
ВАКО 0х103024 / XLMI 0x00900: INTR 


* bits 0-3: INPUT - 0: no interrupt pending - 1: UNK INPUT 1 - 2: END OF STREAM - 3: 
UNK INPUT 3-4: SLICE DATA DONE 


* bit 4: MBRING FULL 


Stream input 


Todo: RE and write me 


MBRING output 


Todo: write me 
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Command and status registers 


Todo: write me 


Command 0: GET UE 


Parameter: none 
Result: the decoded value of parsed bitfield, or Oxffffffff if out of range 


Parses one ue(v) element as defined in H.264 spec. Only elements in range O..Oxfffe [up to 31 bits in the bitstream] are 
supported by this command. If the next bits of the bitstream are a valid ue(v) element in supported range, the element 
is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. Otherwise, bitstream 
pointer is not modified and Oxffffffff is returned. 


Operation: 


if (nextbits(16) != 0) { 

int bitcnt = 0; 

while (getbits(1) == 0) 

bitentet; 

return (1 << bitcnt) - 1 + getbits(bitcnt); 
) else ( 

return Oxffffffff; 
} 


Соттапа 1: ОЕТ 5Е 


Parameter: none 
Result: the decoded value of parsed bitfield, or 0x80000000 if out of range 


Parses one se(v) element as defined in H.264 spec. Only elements in range -Ox7fff..0x7fff [up to 31 bits in the 
bitstream] are supported by this command. If the next bits of the bitstream are a valid se(v) element in supported 
range, the element is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. 
Otherwise, bitstream pointer is not modified and 0x80000000 is returned. 


Operation: 
if (nextbits(16) != 0) 1 
int bitcnt = 0; 
while (getbits(1) == 0) 
bitontess 
int tmp = (1 << bitcnt) - 1 + getbits (bitcnt); 


if (tmp & 1) 
return (Епр-1) >> 1; 
else 
return - (tmp >> 1); 
) else ( 
return 0x80000000; 
} 
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Command 2: GETBITS 


Parameter: number of bits to read, or 0 to read 32 bits [5 bits] 
Result: the bits from the bitstream 
Given parameter n, returns the next (n?n:32) bits from the bitstream as an unsigned integer. 


Operation: 


return getbits (n?n:32); 


Command 3: NEXT START CODE 


Parameter: none 
Result: the next start code found 


Skips bytes in the raw bitstream until the start code [00 00 01] is found. Then, read the byte after the start code and 
return it as the result. The bitstream pointer is advanced to point after the returned byte. 


Operation: 


byte align(); 

while (nextbytes raw(3) !- 1) 
getbits raw(8); 

getbits raw(24); 

return getbits raw(8); 


Command 4: CABAC START 


Parameter: none 
Result: none 


Skips bits in the bitstream until the current bit position is byte-aligned, then initialises the arithmetic decoding engine 
registers codIRange and codIOffset, as per H.264.9.3.1.2. 


Oprtation: 


byte align(); 
cabac init engine(); 


Command 5: MORE RBSP DATA 


Parameter: none 
Result: 1 if there's more data in RBSP, 0 otherwise 


Returns 0 if there's a valid RBSP trailing bits element at the current bit position, 1 otherwise. Does not modify the 
bitstream pointer. 


Operation: 


return more rbsp data(); 
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Command 6: MB SKIP FLAG 


Parameter: none 
Result: value of parsed mb skip flag 


Parses the CABAC mb skip flag element. The SLICE, POS has to be set to the address of the macroblock to which 
this element applies. 


Operation: 


return cabac mb skip flag(); 


Command 7: END OF SLICE FLAG 


Parameter: none 
Result: value of parsed end of slice flag 
Parses the CABAC end of slice flag element. 


Operation: 


return cabac terminate(); 


Command 8: CABAC INIT CTX 


Parameter: none 
Result: none 


Initialises the CABAC context variables, as per H.264.9.3.1.1. slice type, cabac init idc [for P/B slices], and sliceqpy 
have to be set in the PARM registers for this command to work properly. 


Operation: 


cabac init, ctx(); 


Command 9: MACROBLOCK SKIP MBFDF 


Parameter: mb field decoding flag presence [1 bit] 
Result none 


If parameter is 1, mb field decoding flag syntax element is parsed. Otherwise, the value of mb field decoding flag 
is inferred from preceding macroblocks. A skipped macroblock with thus determined value of mb field decoding flag 
18 emitted into the MBRING, and its data stored into internal state. SLICE, POS has to be set to the address of this 
macroblock. 


Operation: 


if (param) { 
if (entropy. coding mode flag) 
this mb.mb field decoding flag = сарас mb field decoding. flag(); 
else 
this mb.mb field decoding flag = getbits(1); 


(continues on next page) 
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(continued from previous page) 


) else ( 
this mb.mb field decoding flag - mb field decoding flag infer(); 
} 
this mb.mb skip flag = 1; 
this mb.slice tag = slice tag; 
mbring emit, macroblock(); 


Todo: more inferred crap 


Command 0xa: MACROBLOCK LAYER MBFDF 


Parameter: mb field decoding flag presence [1 bit] 
Result: none 


If parameter is 1, mb field decoding flag syntax element is parsed. Otherwise, the value of mb field decoding flag 
is inferred from preceding macroblocks. A macroblock layer syntax structure is parsed from the bitstream, data for 
the decoded macroblock is emitted into the MBRING, and stored into internal state. SLICE, POS has to be set to the 
address of this macroblock. 


Operation: 


if (param) { 
if (entropy. coding mode flag) 
this mb.mb field decoding flag = cabac mb field decoding flag(); 
else 
this mb.mb field decoding flag = getbits(1); 
) else ( 
this mb.mb field decoding flag = mb field decoding flag infer(); 


} 

this mb.mb skip flag = 0; 

this mb.slice tag = slice tag; 
macroblock layer(); 


Command 0xb: PRED WEIGHT TABLE 


Parameter: none 
Result: none 


Parses the pred. weight table element, stores its contents in internal memory, and advances the bitstream to the end of 
the element. 


Operation: 


Todo: write me 


Command 0хс: SLICE DATA 


Parameter: none 
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Result: none 


Writes the stored pred weight table data to MBRING, parses the slice data element, storing decoded data 
into MBRING, halting when the RBSP trailing bit sequence is encountered. When done, raises the MAC- 
ROBLOCKS DONE interrupt. Bitstream pointer is updated to point to the RBSP traling bits. SLICE POS has to 
be set to the address of the first macroblock on slice before this command is called. When this command finishes, 
SLICE POS is updated to the address of the last macroblock in the parsed slice. 


Operation: 


if (entropy coding mode flag) { 
cabac, init ctx(); 
byte align(); 
cabac init engine(); 

} 

mb pos.first = 1; 


first = 1; 
Skip pending = 0; 
end = 0; 


bottom = 0; 
while (1) 4 
if (slice type -- || slice type == B) ( 
if (entropy coding mode flag) { 
while (1) 4 
tmp = cabac mb skip flag(); 
if (!tmp) 
break; 
Skip pending-*; 
if (!mbaff frame flag || bottom) ( 
end = cabac terminate(); 
if (end) 
break; 
} 
bottom = !bottom; 
} 
} else { 
skip_pending = get_ue(); 
end = !more rbsp data(); 
bottom ^= skip pending 6 1; 


} 
} е1зе { 

Skip pending = 0; 
} 
while (1) 4 

if (!skip pending) 


break; 

if (mbaff frame flag && bottom && skip pending « 2) 
break; 

if (first) { 
first = 0; 

р else 1 


mb pos, advance(); 
} 
macroblock skip mbfdf(0); 
Skip pending--; 
} 
if (end) 
break; 


(continues on next page) 


2.11. Video decoding, encoding, and processing 471 


nVidia Hardware Documentation, Release git 


(continued from previous page) 


if (first) { 
first = 0; 
) else 4 
mb pos advance(); 
} 
if (mbaff frame flag) { 
if (skip pending) { 
macroblock skip mbfdf(1); 
mb pos, advance(); 
macroblock layer mbfdf(0); 
Skip pending - 0; 
) else 1 
if (bottom) { 
macroblock layer mbfdf(0); 
) else { 
macroblock layer mbfdf(1); 


} 
bottom = !bottom; 
} else { 
macroblock_layer_mbfdf (0); 
} 
if (entropy_coding_mode) { 
if (mbaff frame flag && bottom)) { 
end = 0; 
) else { 
end = сарас terminate(); 
} 
) else { 
end = !more rbsp data(); 
} 
if (end) break; 
} 
trigger intr(SLICE DATA DONE); 


MBRING format 


Contents 


* MBRING format 


— Introduction 


Packet type 0: macroblock info 


Packet type 1: motion vectors 


Packet type 2: residual data 


Packet type 3: coded block mask 


Packet type 4: pred weight table 


472 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Introduction 


An invocation of SLICE DATA VLD command writes the decoded data into the MBRING. The MBRING is a ring 
buffer located in VM memory, made of 32-bit word oriented packets. Each packet starts with a header word, whose 
high 8 bits signify the packet type. 


An invocation of SLICE DATA command writes the following packets, in order: 
* pred weight table [packet type 4] - if PRED WEIGHT TABLE command has been invoked previously 
* for each macroblock [including skipped] in slice, in decoding order: 
— motion vectors [packet type 1] - if macroblock is not skipped and not intra coded 


— macroblock info [packet type 0] - always 


residual data [packet type 2] - if at least one non-zero coefficient present 


coded block mask [packet type 3] - if macroblock is not skipped 


Packet type 0: macroblock info 


Packet is made of a header word and 3 or 6 payload words. 
* Header word: 
— bits 0-23: number of payload words [3 or 6] 
— bits 24-31: packet type [0] 
* Payload word 0: 
— bits 0-12: macroblock address 
* Payload word 1: 
— bits 0-7: y coord in macroblock units 
— bits 8-15: x coord in macroblock units 
* Payload word 2: 
— bit 0: first macroblock of a slice flag 
— bit 1: mb skip flag 
— bit 2: mb field coding flag 
— bits 3-8: mb type 
— bits 9+1*4 - 12454, 1 < 4: sub mb турей| 
— bit 25: transform, size 8x8 flag 
* Payload word 3: 
— bits 0-5: mb qp delta 
— bits 6-7: intra chroma, pred mode 
* Payload word 4: 
— bits i*4+0 - 1*4+2, 1 < 8: rem intra pred mode[i] 


- biti*443,1« 8: prev intra pred mode flag[i] 


* Payload word 5: 
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— bits i*4+0 - 1*4+2, i < 8: rem intra pred mode[i-8] 


- biti*443,1« 8: prev intra pred mode flag[i-8] 


Packet has 3 payload words when macroblock is skipped, 6 when it's not skipped. This packet type is present 
for all macroblocks. The mb type and sub mb type values correspond to values used in CAVLC mode for cur- 
rent slice type - thus for example I NxN is mb type О when decoding I slices, mb type 5 when decoding P slices. 
For I NxN macroblocks encoded in 4x4 transform mode, rem, intra pred mode[i] and pred intra pred mode flag[i] 
correspond to rem intradx4 pred mode[i] and pred іпіта4х4 pred mode flag[i] for i = 0.15. For I NxN mac- 
roblocks encoded in 8x8 transform mode, rem intra pred mode[i] and pred intra pred mode flag[i] correspond to 
rem, intra8x8 pred mode[i] and pred intra8x8 pred mode flag[i] for i = 0..3, and are unused for i = 4..15. 


Packet type 1: motion vectors 


Packet is made of two header words + 1 word for each motion vector. 
* Header word: 
— bits 0-23: number of motion vectors [always 0x20] 
— bits 24-31: packet type [1] 
* Second header word: 
— biti- bit 4 of ref idx[i] 
* Motion vector word i: 
— bits 0-12: mvd[i] Y coord 
— bits 13-27: mvd[i] X coord 
— bits 28-31: bits 0-3 of ref idx[i] 


Indices 0..15 correspond to туа 10 and ref зах 10, indices 16-31 correspond to туа 11 and ref idx 11. Each index 
corresponds to one 4x4 block, in the usual scan order for 4x4 blocks. Data is always included for all blocks - if 
macroblock/sub-macroblock partition size greater than 4x4 is used, its data is duplicated for all covered blocks. 


Packet type 2: residual data 


Packet is made of a header word + 1 halfword for each residual coefficient + О or 1 halfwords of padding to the next 
multiple of word size 


* Header word: 
— bits 0-23: number of residual coefficients 
— bits 24-31: packet type [2] 

* Payload halfword: 
— bits 0-15: residual coefficient 


ForI PCM macroblocks, this packet contains one coefficient for each pcm sample * element present in the bitstream, 
stored in bitstream order. 


For other types of macroblocks, this packet contains data for all blocks that have at least one non-zero coefficient. 
If a block has a non-zero coefficient, all coefficients for this block, including zero ones, are stored in this packet. 
Otherwise, The block is entirely skipped. The coefficients stored in this packet type are dezigzagged - their order 
inside a single block corresponds to raster scan order. The blocks are stored in decoding order. The mask of blocks 
stored in this packet is stored in packet type 3. If there are no non-zero coefficients in the whole macroblock, this 
packet is not present. 


474 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Packet type 3: coded block mask 


Packet is made of a header word and a payload word. 
* Header word: 
— bits 0-23: number of payload words [1] 
— bits 24-31: packet type [3] 
* Payload word [4x4 mode]: 
— bits 0-15: Ішпа 4x4 blocks 0-15 [16 coords each] 
bit 16: Cb DC block [4 coords] 
— bit 17: Cr DC block [4 coords] 
— bits 18-21: Cb AC blocks 0-3 [15 coords each] 
— bits 22-25: Cr AC blocks 0-3 [15 coords each] 


* Payload word [8x8 mode]: 

— bits 0-3: luma 8x8 blocks 0-3 [64 coords each] 

— bit 4: Cb DC block [4 coords] 

— bit 5: Cr DC block [4 coords] 

— bits 6-9: Cb AC blocks 0-3 [15 coords each] 

— bits 10-13: Cr AC blocks 0-3 [15 coords each] 
* Payload word [intra 16x16 mode]: 

— bit 0: luma DC block [16 coords] 
bits 1-16: luma AC blocks 0-15 [15 coords each] 
bit 17: Cb DC block [4 coords] 

— bit 18: Cr DC block [4 coords] 

— bits 19-22: Cb AC blocks 0-3 [15 coords each] 

— bits 23-26: Cr AC blocks 0-3 [15 coords each] 
* Payload word [PCM mode]: [all 0] 


This packet stores the mask of blocks present in preceding packet of type 2 [if any]. The bit corresponding to a block 
is 1 if the block has at least one non-zero coefficient and is stored in the residual data packet, O if all its coefficients 
are zero and it's not stored in the residual data packet. This packet type is present for all non-skipped macroblocks, 
including I PCM macroblocks - but its payload word is always equal to 0 for I PCM. 


Packet type 4: pred weight table 


Packet is made of a header word and a variable number of table write requests, each request being two words long. 
* Header word: 
— bits 0-23: number of write requests 
— bits 24-31: packet type [4] 


* Request word 0: table index to write 
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Request word 1: data value to write 


The pred weight table is treated as an array of 0x81 32-bit numbers. This packet is made of “write requests" which are 


supposed to modify the table entries in the receiver. 


The table indices are: 


Index i * 2, 0 <=i <= Ox1f: 

— bits 0-7: Ішпа offset. lO[i] 

— bits 8-15: luma weight lO[i] 

— bit 16: chroma, weight 10 flag[i] 

— bit 17: luma, weight 10 flag[i] 
Index1* 2+ 1, 0 <= i <= Ox1f: 

— bits 0-7: chroma, offset. 10111111 

— bits 8-15: сһгота weight lO[i][1] 

— bits 16-23: chroma offset lO[1][0] 

— bits 24-31: chroma, weight lO[1][0] 
Index 0x40 + i * 2, 0 <= i <= Ox1f: 

— bits 0-7: luma offset. 11[i] 


bits 8-15: luma, weight 111] 


bit 16: сһгота weight 11. flag[i] 
— bit 17: luma, weight 11 flag[i] 
Index 0x40 +1* 2 + 1, 0 <= i <= Ox1f: 
— bits 0-7: chroma, offset. 11[i][1] 
— bits 8-15: chroma , weight l1[i][1] 
bits 16-23: chroma offset. 11[1][0] 
— bits 24-31: chroma, weight 11 [1][0] 
Index 0x80: 


— bits 0-2: chroma log2 weight denom 


— bits 3-5: Ішпа log2 weight denom 


The requests are emitted in the following order: 


• for 0 <=i <= num ref idx 10 active minusl1: 2*i, 2*i + 1 


0x80 


• for 0 <=i <= num ref idx l1 active minus1: 0x40 + 2%, 0x40 + 2*1 + 1 


The fields corresponding to data not present in the bitstream are set to O, they're not set to their inferred values. 


VP2 command macro processor 
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Contents 


* VP2 command macro processor 
— Introduction 
— MMIO registers 


— Control and status registers 


Interrupts 
- FIFOs 


Commands 


— Execution state and registers 
* Code RAM 
* Execution control 


* Parameter registers 


* 


Global registers 
* Special registers 
* The LUT 
— Opcodes 
* Command opcodes 


* Data opcodes 


* Destination write 


Introduction 


The VP2 macro processor is a small programmable processor that can emit vector processor commands when triggered 
by special commands from xtensa. All vector commands first go through the macro processor, which checks whether 
they're in macro command range, and either passes them down to vector processor, or interprets them itself, possibly 
launching a macro and submitting other vector commands. It is one of the four major blocks making up the PVP2 
engine. 


The macro processor has: 


* 64-bit VLIW opcodes, controlling two separate execution paths, one primarily for processing/emitting com- 
mands, the other for command parameters 


* dedicated code RAM, 512 64-bit words in size 
* 32 * 32-bit word LUT data space, RW by host and RO by the macro code 
* 6 32-bit global [not banked] GPRs visible to macro code and host [$g0-$g5] 


* 8 32-bit banked GPRs visible to macro code and host, meant for passing parameters - one bank is writable by 
the param commands, the other is in use by macro code at any time [$p0-$p7] 


* 3 1-bit predicates, with conditional execution [$p1-$p3] 
* instruction set consisting of bit operations, shifts, and 16-bit addition 


* no branch/loop capabilities 
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* a 32-bit command path accumulator [$cacc] 


* a 32-bit data path accumulator [$dacc] 
* a 7-bit LUT address register [$lutidx] 


* 15-bit command, 32-bit data, and 8-bit high data registers for command submission [$cmd, $data, $datahi] 


* 64-entry input command FIFO 


* 2-entry output command FIFO 


* asingle hardware breakpoint 


MMIO registers 


The macro processor registers occupy 0x00f600:0x00f700 range in BARO space, corresponding to 0x2c000:0x2e000 
range in PVP2's XLMI space. They are: 


XLMI MMIO Name Description 

0x2c000 0х00ғ600 CONTROL master control 

0x2c100 0х00ғ608 STATUS detailed status 

0х2с180 0х00Ғ60с IDLE a busy/idle status 

0x2c200 0х00#610 INTR EN interrupt enable 

0x2c280 Ox00f614 INTR interrupt status 

0x2c300 Ox00f618 BREAKPOINT breakpoint address and enable 
0x2c800:0x2c880 | 0x00f640 LUT[0:32] the LUT data 

0х2с880:0х2с840 | 0х00#644 РАКАМ A[0:8] $p bank A 

0x2c900:0x2c920 | Ox00f648 РАКАМ B[0:8] $p bank B 

0х2с980:0х2с9а0 | OxOOf64c GLOBAL[0:8] $0 registers 

Ox2cb80 Ox00f65c PARAM_SEL $p bank selection switch 

0х2сс00 0х00Ғ660 RUNNING code execution in progress switch 
0х2сс80 0х008664 РС program counter 

0х2с400 Ox00f668 DATAHI $datahi register 

0х2с480 Ox00f66c LUTIDX $lutidx register 

0х2се00 0x00f670 CACC $cacc register 

0х2се80 0x00f674 CMD ӛста register 

Ox2cf00 0x00f678 DACC $dacc register 

Ox2cf80 0х00Ғ67с DATA $data register 

0х24000 0х00#680 IFIFO DATA input FIFO data 

0х24080 0х008684 IFIFO_ADDR input FIFO command 

0x2d100 Ox00f688 IFIFO TRIGGER | input FIFO manual read/write trigger 
0х24180 Ox00f66c IFIFO_SIZE input FIFO size limitter 

0х24200 0x00f670 IFIFO STATUS input FIFO status 

0х24280 0x00f674 OFIFO DATA output FIFO data 

0х24300 Ox00f678 OFIFO_ADDR output FIFO command & high data 
0x2d380 Ox00f67c OFIFO_TRIGGER | output FIFO manual read/write trigger 
0х24400 0х00#680 OFIFO SIZE output FIFO size limitter 

0х24480 0х008684 OFIFO STATUS output FIFO status 

0х24780 Ox00f6bc CODE_SEL selects high or low part of code RAM for code window 
0х24800:0х2е000 | 0x00f6c0:0x00f700 | CODE a 256-word window to code space 


Control and status registers 
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Todo: write me 


Interrupts 


Todo: write me 


FIFOs 


Todo: write me 


Commands 


The macro processor processes commands in Oxc000-Oxdfff range from the input FIFO, passing down all other com- 
mands directly to the output FIFO [provided that no macro is executing at the moment]. The macro processor com- 
mands are: 


Command | Name Description 
0xc000-Hi*4 | MACRO РАКАМ[0:8] write to $p host register bank 
0xc020-H*4 | MACRO GLOBAL([O:8] write to $g registers 


0хс080-454 | MACRO LUTYT[0:32] write to given LUT entry 
Охс 100 МАСКО ЕХЕС execute a macro 
0xc200 MACRO DATAHI write to $datahi register 


0ха000+1*4 | MACRO CODBE[0:0x400] | upload half of a code word 


Execution state and registers 
Code RAM 


The code RAM contains 512 opcodes. Opcodes are 64 bits long and are accessible by the host as pairs of 32-bit words. 
Code may be read or written using MMIO window: 


BARO 0x00f6bc / XLMI 0х24780: CODE SEL 1-bit RW register. Writing 0 selects code RAM entries 0:0x100 to 
be mapped to the CODE window, writing 1 selects code RAM entries 0x100:0x200. 


BARO 0x00f6c0 + (i >> 5) * 4 [index i & 0х1] / XLMI 0x2d800 + i * 4, i < 0x200: CODE[i] The code window. 
Reading or writing CODE[i] is equivalent to reading or writing low [if i is even] or high [if 1 is odd] 32 bits of 
code RAM cell i >> 11СОПЕ SEL << 8. 


They can also be written in pipelined manner by the MACRO CODE command: 


VP command 0xd000 + i * 4, i < 0x400: MACRO CODE[i] Write the parameter to low [if i is even] or high [if i is 
odd] 32 bits of code RAM cell i >> 1. If a macro is currently executing, execution of this command is blocked 
until it finishes. Valid only on macro input FIFO. 
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Execution control 


Todo: write me 


Parameter registers 


Parameter registers server dual purpose: they're meant for passing parameters to macros, but can also be used as GPRs 
by the code. There are two banks of parameter registers, bank A and bank B. Each bank contains 8 32-bit registers. At 
any time, one of the banks is in use by the macro code, while the other can be written by the host vla MACRO PARAM 
commands for next macro execution. Each time a macro is launched, the bank assignments are swapped. The current 
assignment is controlled by the PARAM SEL register: 


ВАКО 0x00f65c / XLMI 0x2cb80: РАКАМ SEL 1-bit RW register. Can be set to one of: 
* 0: CODE А СМО В - bank A is in use by the macro code, commands will write to bank В 
e 1: CODE В СМО A - bank B is in use by the macro code, commands will write to bank А 
This register is toggled on every MACRO EXEC command execution. 
The parameter register banks can be accessed through MMIO registers: 


BARO 0x00f644 [index 1] / XLMI 0x2c880 + i * 4, 1 < 8: PARAM A[i] BARO 0x00f648 [index i] / XLMI 0x2c900 + 
1*4,1« 8 PARAM ВП) 


These MMIO registers are mapped straight to corresponding parameter registers. 
The bank not currently in use by code can also be written by MACRO PARAM commands: 


VP command 0xc000 + i * 4,1 < 8: MACRO PARAM[i] Write the command data to parameter register i of the 
bank currently not assigned to the macro code. Execution of this command won't wait for the current macro 
execution to finish. Valid only on macro input FIFO. 


The parameter registers are visible to the macro code as GPR registers 0-7. 


Global registers 


There are 6 normal global registers, $g0-$g5. They are simply 32-bit GPRs for use by macro computations. There are 
also two special global pseudo-registers, $g6 and $g7. 


$g6 is the LUT readout register. Any attempt to read from it will read from the LUT entry selected by $lutidx register. 
Any attempt to write to it will be ignored. 


$g7 is the special predicate register, $pred. Its 4 low bits are mapped to the four predicates, $p0-$p3. Any attempt to 
read from this register will read the predicates, and fill high 28 bits with zeros. Any attempt to write this register will 
write the predicates. 


$p0 is always forced to 1, while $p1-$p3 are writable. The predicates are used for conditional execution in macro 
code. In addition to access through $pred, the predicates can also be written by macro code individually as a result of 
various operations. 


АП 8 global registers are accessible through MMIO and the command stream: 


ВАКО 0x00f64c [index i] / XLMI 0x2c980 + i * 4, i < 8: GLOBAL[i] These registers are mapped straight to corre- 
sponding global registers. 


VP command 0xc020 + i * 4, i < 8: MACRO _GLOBAL[i] Write the command data to global register i. If a macro 
is currently executing, execution of this command is blocked until it finishes. Valid only on macro input FIFO. 
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The global registers are visible to the macro code as GPR registers 8-15. 


Special registers 


In addition to the GPRs, the macro code can use 6 special registers. There are 4 special registers belonging to the 
command execution path, identified by a 2-bit index: 


* 0: $cacc, command accumulator 
* 1: $cmd, output command register 
* 2: $lutidx, LUT index 
* 3: $datahi, output high data register 
There are also 2 special registers belonging to the data execution path, identified by a 1-bit index: 
* 0: $dacc, data accumulator 
* 1: $data, output data register 


The $cacc and $dacc registers are 32-bit апа can be read back by the macro code, and so are usable for general purpose 
computations. 


Тһе $cmd, $data, and $datahi registers are write-only by the macro code, and their contents are submitted to the macro 
output FIFO when a submit opcode is executed. $data is 32-bit, $datahi is 8-bit, mapping to bits 0-7 of written values. 
$cmd is 15-bit, mapping to bits 2-16 of written values. The $datahi register is also used to fill the high data bits in 
output FIFO whenever a command is bypassed from the input FIFO. 


The $lutidx register is 5-bit and write-only by the macro code. It maps to bits 0-4 of written values. Its value selects 
the LUT entry visible in $g6 pseudo-register. 


АП 6 special registers can be accessed through MMIO, and the $datahi register can be additionally set by a command: 


MMIO 0x00f668 / XLMI 0х2с400: DATAHI MMIO 0x00f66c / XLMI 0х2с480: LUTIDX MMIO 0x00f670 / 
XLMI 0x2ce00: CACC MMIO 0x00f674 / XLMI 0x2ce80: CMD MMIO 0x00f678 / XLMI Ox2cf00: РАСС MMIO 
0x00f67c / XLMI 0х2с#80: DATA 


These registers map directly to corresponding special registers. For $cacc, $dacc, and $data, all bits are 
valid. For $cmd, bits 2-16 are valid. For $lutidx, bits 0-4 are valid. For $datahi, bits 0-7 are valid. 
Remaining bits are forced to 0. 


VP command 0xc200: MACRO DATAHI Sets $datahi to low 8 bits of the command data. If a macro is currently 
executing, execution of this command is blocked until it finishes. Valid only on macro input FIFO. 


The LUT 


The LUT is a small indexable RAM that's read-only by the macro code, but freely writable by the host. It's made of 
32 32-bit words. The LUT entry selected by $lutidx register can be read by macro code simply by reading from the 
$g6 pseudo-register. The LUT can be accessed by the host through MMIO and the command stream: 


BARO 0x00f640 [index i] / XLMI 0x2c800 + i * 4, i < 32: LUT[i] These registers are mapped straight to corre- 
sponding LUT entries. 


VP command 0xc080 + i * 4, i < 32: MACRO_LUTT[i] Write the command data to LUT entry i. If a macro is cur- 
rently executing, execution of this command is blocked until it finishes. Valid only on macro input FIFO. 
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Opcodes 


The 


code opcodes are 64 bits long. They're divided in several major parts: 
* bits 0-2: conditional execution predicate selection. 
— bits 0-1: PRED, the predicate to use [selected from $p0-$p3] 
— bit 2: PNOT, selects whether the predicate is negated before use. 
* bit 3: EXIT, exit flag 
* bit 4: SUBMIT, submit flag 
* bits 5-30: command opcode 
* bits 31-32: PDST, predicate destination [selected from $p0-$p3] 
* bits 33-63: data opcode 


When a macro is launched, opcodes are executed sequentially from the macro start address until an opcode with the 


exit 


1. 


flag set is executed. An opcode is executed as follows: 


If the SUBMIT bit is set, the current values of $cmd, $data, $datahi are sent to the output FIFO. 


2. Conditional execution status is determined: the predicate selected by PRED is read. If PNOT is set to 0, condi- 


tional execution will be enabled if the predicate is set to 1. Otherwise [PNOT set to 1], conditional execution will 
be enabled if the predicate is set to 0. Unconditional opcodes are simply opcodes using non-negated predicate 
$p0 [PRED = 0, PNOT = 0]. 


. If the SUBMIT bit is set, conditional execution is enabled, and ($cmd & Ox1fe80) == Oxb000 Пе. the sub- 
mitted command was in ОхБО00-0х007с or 0xb100-0xb17c ranges, correnspoding to vector processor param 
commands], $cmd is incremented by 4. This enables submitting several parameters in a row without having to 
update the $cmd register. 


4. If conditional execution is enabled, the command opcode is executed, and the command result, command pred- 


icate result, and the C2D intermediate value are computed. 


. If conditional execution is enabled, the data opcode is executed, and the data result and data predicate result are 
computed. 


6. If conditional execution is enabled, the command and data results are written to their destination registers. 


7. If the EXIT bit is set, macro execution halts. 


Effe 
and 


ctively, conditional execution affects all computations [including auto $cmd increment], but doesn't affect submit 
exit opcodes. 


Command opcodes 


The 


command processing path is mainly meant for processing commands and data going to $lutidx/$datahi register, 


but can also exchange data with the data processing path if needed. 


The 


command opcode bitfields are: 

• bits 5-9: CBFSTART - bitfield start [CINSRT. К, CINSRT I, some data ops] 
• bits 10-14: CBFEND - bitfield end [CINSRT К, CINSRT I, some data ops] 
* bits 15-19: CSHIFT - shift count [CINSRT R] 

* bit 20: CSHDIR - shift direction [CINSRT R] 

* bits 15-20: CIMM6 - 6-bit unsigned immediate [CINSRT I] 
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bits 21-22: CSRC2 - selects command source #2 [CINSRT I, CINSRT_R], one of: 
- 0: ZERO, source #2 is 0 
— 1: CACC, source #2 is current value of $cacc 
- 2: DACC, source #2 is current value of $dacc 
— 3: GPR, source #2 is same as command source #1 
bits 15-22: CIMMS - 8-bit unsigned immediate [CEXTRADDS] 
bits 5-22: CIMMIS - 18-bit signed immediate [CMOV I] 


bits 23-26: CSRCI - selects command source #1 [CINSRT. В, CEXTRADDS, DSHIFT Б, DADD16_R]. The 
command source £1 is the GPR with index selected by this bitfield. 


bits 27-28: CDST - the command destination, determines where the command result will be written; one of: 
- 0: CACC 

1: CMD 

- 2: LUTIDX 

— 3: DATAHI 


bits 29-30: COP - the command operation, one of: 
- 0: CINSRT К, bitfield insertion with shift, register sources 
- 1: CINSRT 1, bitfield insertion with 6-bit immediate source 
- 2: CMOV 1, 18-bit immediate value load 
- 3: CEXTRADDS, bitfield extraction + 8-bit immediate addition 
The command processing path computes four values for further processing: 
* the command result, ie. the 32-bit value that will later be written to the command destination register 
* the command predicate result, ie. the 1-bit value that may later be written to the destination predicate 
* the C2D value, a 32-bit intermediate result used in some data opcodes 
* the command bitfield mask [CBFMASK], a 32-bit value used in some command and data opcodes 


The command bitfield mask is used by the bitfield insertion operations. It is computed from the command bitfield start 
and end as follows: 


if (CBFEND >= CBFSTART) { 
CBFMASK - (2 «« CBFEND) - (1 «« CBFSTART); // bits CBFSTART-CBFEND are 1 
) else ( 
CBFMASK - 0; 


} 


Since the CBFEND and CBFSTART fields conflict with CIMMIS field, the data ops using the command mask should 
not be used together with the CMOV I operation. 


The CINSRT R operation has the following semantics: 


if (CSHDIR == 0) /х 0 is left shift, 1 is right logical shift »/ 
shifted source = command, source 1 << CSHIFT; 
else 
shifted source = command, source 1 >> CSHIFT; 
C2D = command result = (shifted source & CBFMASK) | (command source 2 & -CBFMASK); 
command predicate result - (shifted source & CBFMASK) -- 0; 
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The CINSRT I operation has the following semantics: 


C2D = command result = (CIMM6 << CBFSTART & CBFMASK) | (command source 2 & -CBFMASK); 
command predicate result = 0; 


The CMOV І operation has the following semantics: 


C2D = command result = sext(CIMM18, 17); /» sign-extend 18-bit immediate to 32 bits */ 
command predicate result = 0; 


The CEXTRADDS operation has the following semantics: 


C2D = (command source 1 & CBFMASK) >> CBFSTART; 

command result = ((C2D + CIMM8) & Oxff) | (С2 & ~Oxff); /* add immediate to low 8,, 
—bits of extracted value »/ 

command predicate result - 0; 


Data opcodes 


The command processing path is mainly meant for processing command data, but can also exchange data with the 
command processing path if needed. 


The data opcode bitfields are: 

bits 33-37: DBFSTART - bitfield start [DINSRT. К, DINSRT I, DSEXT] 

bits 38-42: DBFEND - bitfield end [DINSRT. R, DINSRT I, DSEXT] 

bits 43-47: DSHIFT - shift count and SEXT bit position [DINSRT R, DSEXT] 

bit 48: DSHDIR - shift direction [DINSRT R, DSHIFT R] 

bits 43-48: DIMM6 - 6-bit unsigned immediate [DINSRT I] 

bits 33-48: DIMM16 - 16-bit immediate [DADD16 I, DLOGOPI6 I] 

bit 49: C2DEN - enables double bitfield insertion, using C2D value [DINSRT R, DINSRT I, DSEXT] 
bit 49: DDSTSKIP - skips DDST write if set [DADD16 I] 


bit 49: DSUB - selects whether DADD16 К operation does an addition or substraction 
bits 49-50: DLOGOP - the DLOGOPI6 I suboperation, one of: 


- 0: MOV, result is set to immediate 
- 1: AND, result is source ANDed with the immediate 
— 2: OR, result is source ORed with the immediate 
— 3: XOR, result is source XORed with the immediate 
bits 50-51: DSRC2 - selects data source #2 [DINSRT R, DINSRT П, one of: 
- 0: ZERO, source #2 is 0 


- 1: САСС, source £2 is current value of $cacc 
- 2: DACC, source #2 is current value of $dacc 
- 3: GPR, source #2 is same as data source #1 


bit 50: ӘНІ? - selects low or high 16 bits of second operand [DADD16_R] 


bit 51: DHI - selects low or high 16 bits of an operand [DADD16 I, DLOGOPI16 I, DADDI6 R] 
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bits 52-55: DSRC1 - selects data source #1 [DINSRT ЕК, DINSRT I, DADDI6 I, DLOGOPI6 I, DSHIFT В, 
DSEXT, DADD16_R]. The data source #1 is the GPR with index selected by this bitfield. 


bits 33-55: DIMM23 - 23-bit signed immediate [DMOV I] 


bits 56-59: DRDST - selects data GPR destination register. The GPR destination is the GPR with index selected 
by this bitfield. The data result will be written here, along with the special register selected by DDST. 


bit 60: DDST - the data special register destination, determines where the data result will be written (along with 
DRDST); one of: 


- 0: DACC 
- 1: DATA 


bits 61-63: DOP - the data operation, one of: 
- 0: DINSRT_R, bitfield insertion with shift, register sources 
- 1: DINSRT 1, bitfield insertion with 6-bit immediate source 
: DMOV 1, 23-bit immediate value load 
: DADDI6 I, 16-bit addition with immediate 
: DLOGOPIG I, 16-bit logic operation with immediate 
: DSHIFT К, shift by the value of a register 


1 
с л A ù N 


: DSEXT, sign extension 
- 7: DADD16_R, 16-bit addition/substraction with register operands 
The data processing path computes three values: 
e the data result, ie. the 32-bit value that will be written to the data destination registers 
e the data predicate result, ie. the 1-bit value that will be written to the destination predicate 
* the skip special destination flag, a 1-bit flag that disables write to the data special register if set 


Not all data operations produce a predicate result. For ones that don't, the command predicate result will be output 
instead. 


The DINSRT R operation has the following semantics: 


if (DBFEND >= DBFSTART) { 
DBFMASK - (2 «« DBFEND) - (1 «« DBFSTART); // bits DBFSTART-DBFEND are 1 
) else ( 
DBFMASK = 0; 


} 


if (DSHDIR == 0) /» 0 is left shift, 1 is right arithmetic shift х/ 
shifted source = data source 1 << DSHIFT; 
else 
shifted source - (-1 «« 32 | data source 1) »» DSHIFT; 
data result = (data source 2 & -DBFMASK) | (shifted source & DBFMASK); 
if (C2DEN) 
data result = (data result & -СВЕМА5К) | (C2D & CBFMASK); 
data predicate result = (shifted source & DBFMASK) == 0; 
Skip special destination - false; 


The DINSRT І operation has the following semantics: 
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if (DBFEND >= DBFSTART) { 
DBFMASK = (2 << DBFEND) - (1 << DBFSTART); // bits DBFSTART-DBFEND are 1 
) else ( 
DBFMASK = 0; 
} 
data result = (data source 2 8 -DBFMASK) | (DIMM6 << DBFSTART & DBFMASK); 
if (C2DEN) 
data result = (data result & -СВЕМА5К) | (C2D & CBFMASK); 


data predicate result - command predicate result; 
Skip special destination - false; 


The DMOV [I operation has the following semantics: 


data result = sext (DIMM23, 22); 
data predicate result - command predicate result; 
Skip special destination - false; 


/* sign-extend 23-bit immediate to 32 bits х/ 


The РАрр16 I operation has the following semantics: 


sum = ((data source 1 >> (16 х DHI)) + DIMM16) 
data result = (data source 1 & -(Oxffff << 
data predicate result = sum >> 15 & 1; 
Skip special destionation = DDSTSKIP; 


& Oxffff; 


(16 « DHI))) | sum << 


(16 х DHI); 


The DLOGOP16 I operation has the following semantics: 


src = 
switch 
case 
case 
case 
case 


(data source 1 >> (16 х DHI)) & Oxffff; 
(DLOGOP) { 

MOV: res = DIMM16; break; 

AND: src & DIMM16; 
OR: src | DIMM16; 


XOR: src ^ DIMM16; 


break; 
break; 
break; 


res = 

res = 
res = 
} 

data result = (data source 1 & -(Oxffff << 

data predicate result = (res == 0); 

Skip special destination - false; 


(16 « DHI))) | res << 


(16 х DHI); 


The DSHIFT R operation has the following semantics: 


shift = command source 1 & 0х18р 

if (DSHDIR == 0) /х 0 is left shift, 1 is right arithmetic shift 
data result = data source 1 << shift; 

else 
data result = (-1 << 32 | data source 1) >> shift; 

data predicate result - command predicate result; 

Skip special destination - false; 


*/ 


The DSEXT operation has the following semantics: 


bfstart = max(DBFSTART, DSHIFT); 
if (DBFEND >= bfstart) { 

DBFMASK - (2 «« DBFEND) - (1 «« bfstart); // bits bfstart-DBFEND are 1 
) else 4 

DBFMASK - 0; 


} 
sign = 
data_result = 


data source 2 >> DSHIFT & 1; 


(data source 2 & ~DBFMASK) | (sign ? DBFMASK 0); 


(continues on next page) 
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(continued from previous page) 


if (C2DEN) 

data result = (data result 8 ~CBFMASK) | (C2D & CBFMASK); 
data predicate result = sign; 
Skip special destination - false; 


The DADD16_R operation has the following semantics: 


srcl = (data source 1 >> (16 х DHI)) & Oxffff; 
Src2 = (command source 1 >> (16 х DHI2)) 4 Oxffff; 
if (DSUB == 0) 
sum = (srcl + src2) & Oxffff; 
else 
sum = (srcl - src2) & Oxffff; 
data result = (data source 1 8 -(Oxffff << (16 х DHI))) | sum << (16 х DHI); 
data predicate result = sum >> 15 & 1; 
Skip special destionation - false; 


Destination write 


Once both command and data processing is done, the results are written to the destination registers, as follows: 
* command result is written to command special register selected by CDST. 
* data result is written to data special register selected by DDST, unless skip special destionation is true. 
* data result is written to GPR selected by DRDST. This can be effectively disabled by setting DRDST to $g6. 


* data predicate result is written to predicate selected by PDST. This can be effectively disabled by setting PDST 
to $p0. 


Introduction 


Todo: write me 


2.11.4 VP3/VP4/VP5 video decoding 


Contents: 


VP3 MBRING format 


Contents 


* VP3 MBRING format 


— Introduction 


— type 00: Macro block header 


* MPEG2 
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* H.264 
* Error 
— type 01: Motion vector 
* MPEG2 
* H.264 
— type 02: РСТ coordinates 
- type 03: PCM data 
— type 04: Coded block pattern 
* MPEG2 
* H.264 
— type 05: Pred weight table 


— type 06: End of stream 


— Macroblock 


Introduction 


The macroblock ring outputted from VLD is packet based, and aligned on 32-bit word size. 


A packet has the header type in bits [24..31] and length in bits [0..23]. The data length is in words, and doesn't include 
the header itself. 


type 00: Macro block header 
MPEG2 


The macro block header contains 4 data words: 
* Word 0: 
- [0:15] Absolute address in macroblock units, 0 based 
* Word 1: 
— [0:7] Y coord in macroblock units, 0 based 
— [8:15] X coord in macroblock units, 0 based 
* Word 2: 
0] not coded[??] 
1] skipped[??] 


3] quant 


5] тойоп backward 


- [0] 
- [1] 
= [3] 
— [4] motion, forward 
- [5] 
— [6] coded_block_pattern 
- [7] 


7] intra 
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— [26:26] dct type 

- [27:28] motion type 
* 0: field motion 
* 1: frame-based motion 
* 2: 16x8 field 
* 3: dual prime motion 

* Word 3: 
— [6:7] motion vector count 


— [8:12] quantiser scale code 


H.264 


* Payload word 0: 
— bits 0-12: macroblock address 
* Payload word 1: 
— bits 0-7: y coord in macroblock units 


— bits 8-15: x coord in macroblock units 


Payload word 2: 
— bit 0: first macroblock of a slice flag 
— bit 1: mb skip flag 
— bit 2: mb field coding flag 
— bits 3-8: mb type 
— bits 9-H*4 - 12-4%4,1< 4: sub mb type[i] 
— bit 25: transform, size 8x8 flag 
* Payload word 3: 
— bits 0-5: mb qp delta 
— bits 6-7: intra chroma, pred mode 
* Payload word 4: 
— bits i*4+0 - 1*4+2, 1 < 8: rem intra pred mode[i] 


— biti*443,1« 8: prev intra pred mode flag[i] 


* Payload word 5: 
— bits i*4+0 - 1*4+2, i < 8: rem intra pred mode[i-8] 


- bit i*4+3, i< 8: prev intra pred mode flag[i-8] 


2.11. Video decoding, encoding, and processing 489 


nVidia Hardware Documentation, Release git 


Error 


The macro block header contains 3 data words: 
* Word 0: 
- [0:15] Absolute address in macroblock units, 0 based 
— [16] error flag, always set 
* Word 1: 
— [0:7] Y coord in macroblock units, 0 based 
— [8:15] X coord in macroblock units, 0 based 


* Word 2: all 0 


type 01: Motion vector 


MPEG2 


Todo: Verify whether X or Y is in the lowest 16 bits. I assume X 


The motion vector has a length of 4 data words, and contains a total of 8 PMVs with a size of 16 bits each. The motion 
vectors are likely encoded in order of the spec with PMV[r][s][t]. 


The layout of each 16 bit PMV: 
* [0:5] motion code 


6:13] residual 


[ 
* [14] motion vertical field, select 
[14:15] dmvector (0, 1, or 3) 


motion vertical field select and dmvector occupy same bits, but the mpeg spec makes them mutually exclusive, so 
they don't conflict. 


H.264 


Payload like VP2, except length is in 32-bit words. 


type 02: DCT coordinates 


A packet of this type is created for each pattern enabled in coded block pattern. This packet type is byte oriented, 
rather than word oriented. It splits the coordinates up in chunks of 4 coordinates each, so 0..3 becomes 0, 4..7 becomes 
1, 60..63 becomes chunk 15. The first 2 bytes contain a 16-bit bitmask indicating the presence of each chunk. If a 
chunks bit is set it will be encoded further. 


For each present chunk a 8-bit bitmask will be created, which contains the size of each coordinate in that chunk. 2 
bits are used for each coordinate, indicating the size (0 = not present, 1 = 1 byte, 2 = 2 bytes). This is followed by all 
coordinates present in this chunk, the last chunk is padded with Os to align to word size. 


For example: 0x10 0x00 0x40 Oxff 
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Chunk 4 (0х0010>>4)&1 has pos 3 (0x40 >> (2*3))&3 set to -1 


type 03: PCM data 


Payload length is 0x60 words. Packet is byte oriented, instead of word oriented. Payload is raw PCM data from 


bitstream. 


type 04: Coded block pattern 
MPEG2 


This packet puts coded block pattern in 1 data word. 


H.264 


Payload like VP2. 


type 05: Pred weight table 


Payload like VP2, except length is in 32-bit words. 


type 06: End of stream 


This header has no length, and signals the parser it's done. 


Macroblock 


A macroblock is created in this order: 
* motion vector (optional) 
* macro block header 
* DCT coordinates / PCM samples (optional, and repeated as many times as needed) 


* coded block pattern (optional) 


‘optional’ is relative to the MPEG spec. For example intra frames always require a coded block pattern. 


Introduction 


Todo: write me 


2.12 Performance counters 


Contents: 
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2.12.1 МУ10:ММ40 signals 


Contents 


• NVIO:NV40 signals 


Todo: convert 


=== NV10 signals === 


0x70: PGRAPH.PM TRIGGER 
0x87: PTIMER TIME B12 [bus/ptimer.txt] 
0x80: trailer base 


=== NV15 signals --- 


0x70: PGRAPH.PM TRIGGER 
0x87: PTIMER TIME B12 [bus/ptimer.txt] 
0x80: trailer base 


=== МУ1Е signals === 


0x70: PGRAPH.PM TRIGGER 
0x86: HEADO VBLANK 
0x87: НЕАП1 VBLANK 
0x80: trailer base 


=== NV20 signals --- 


domain 0 [nvclk]: 
Охаа: HEADO, VBLANK 
0ха0: trailer base 


domain 1 [mclk]: 
0x20: trailer base 


эээ NV28 signals --- 
domain 0 [nvclk]: 


Охаа: НЕАр0 VBLANK 
Оха0: trailer base 


domain 1 [mclk]: 
0x20: trailer base 


=== NV35 signals --- 


domain 0 [nvclk]: 

Oxf8: HEADO, VBLANK 
Oxf9: НЕАП1 VBLANK 
0xe0: trailer base 


(continues on next page) 
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(continued from previous page) 


domain 1 [mclk]: 
0x20: trailer base 


=== NV31 signals --- 


domain 0 [nvclk]: 

Oxf8: НЕАПО VBLANK 
Oxf9: НЕАП1 VBLANK 
0хе0: trailer base 


domain 1 [mclk]: 
0x20: trailer base 


=== NV34 signals --- 


domain 0 [nvclk]: 

Oxda: НЕАПО VBLANK 
Oxdb: НЕАП1 VBLANK 
0хе0: trailer base 


domain 1 [mclk]: 
0x20: trailer base 


2.12.2 NV40:G80 signals 


Contents 


» NV40:G60 signals 


— Introduction 


Introduction 


МУ40 generation cards have the following counter domains: 
* ХУ40 generation cards without turbocache: 
— 0: host clock 
- 1: core clock [PGRAPH front] 
— 2: geometry[?] clock [PGRAPH back] 
— 3: shader clock 
— 4: memory clock 
* ХУ40 generation with turbocache that are not IGPs: 
— 0: host clock 
1: core clock [PGRAPH front] 
2: shader clock 


3: memory clock 
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* NV40 IGP: 
— 0: host clock 
— 1: core clock [PGRAPH probably] 


2: core clock [shaders probably] 


3: unknown, could be the memory interface 


Todo: figure it out 


Todo: find some, I don’t know, signals? 


2.12.3 G80:GF100 signals 


Contents 


e G80:GF100 signals 
— Introduction 
— Host clock 


Core clock A 


- Core clock В 
- Shader clock 


- Memory clock 

- Core clock С 

- Vdec clock (VP2) 

- Vdec clock (VP3/VP4) 
- Core clock Р 


Introduction 


G80 generation cards have the following counter domains: 


, G80: 
— 0: host clock 


1: core clock A 


2: core clock B 


— 3: shader clock 


- 4: memory clock 
* G84:GF100 except MCP7x: 
- 0: host clock 
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— 1: core clock A 
2: core clock B 
3: shader clock 

— 4: memory clock 
5: core clock C 
6: vdec clock 


— 7: core clock D 


— 0: host clock 

— 1: core clock A 
: core clock B 
: shader clock 
: core clock C 


: vdec clock 


1 
ON л A ù N 


: core clock D 


Todo: figure out roughly what stuff goes where 


Todo: find signals. 
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Host clock 


sig- | 680 G84| G86| G92| G94| G96| G98| G200 МСР7ЮСР761Т2153Т2163Т219/СР89оситеп- 


nal tation 
HOST ОМЕМОУУК| 04 104 |04 104 (04 | 05 9? 9? 1а 1а 1а ?? [XXX] 
PCOUNTER.USER 2a- | 2a- | 2a- | 3a- | pcounter/intno.txt 
2b | 2b | 2b | 3b 
222 |??? 9? 2? 2? 2? 9? 92 ба 9? 9? 2? 2? 2? 92 all PFIFO 
engines 
enabled 


and idle??? 
229 122 2? ?? ?? 2? 2? 28 22 27 2? 2? 2? 27 2? happens 


once with 
PFIFO 
write Or 
PDIS- 
PLAY 
access [not 
PFIFO 
read] 

299 17 2? 2? 2? 2? 2? 2? 29 ?? 2? 23 2? 2? 2? 992 on for 
1096 

227 172 2? ?? 77 92 92 2? 2а 2? 2? 2? 92 92 99 227 on for 


1096 
9219 9? 2? 2? 2? 9? 92 2b 9? 77 2? 2? 2? ?? pcie ac- 


tivity 
wakeups 
[long]?!? 
222 |70? ?? 2? 2? 7? 7? 2? 2с ?? 22 2? 2? 7? 2? pcie ac- 


tivity 
bursts? !? 
222 | 0? 2? 2? 2? 2? 2? 2? 2? 2? 2? 2? 74 2? 2? MMIO 


reads? 
HOST IMEMIRD,| If 27 |2a | 2a |2a | 2e 2? 2? 96 | 96 | 96 2? [XXX] 
??? | lc 21 ?? 29 2? 2c ?? 30 | ?? 2? 97 2? 98 2? triple 
MMIO 
read? 
РВО5 RCIE ЖО | 22 |2а |24 124 |24 |31 9? 22 99 |99 | 99 92 [XXX] 
PTI- | 27 2c 2c 34 37 37 37 3b 53 53 a3 a3 a3 4a bus/ptimer.txt 
MER| TIME В12 
PBUS ЖЕ ЖАМ 36 |39 (39 |39 (34 (22 22 а5 а5 а5 2? [XXX] 
РВО5 2СІЕ ЖК | 2f 37 | 3a За За Зе 9? 9? а6 а6 а6 92 [XXX] 
PCOUB@ER.FRAIIAR | 4c- | 4c- | 4c- | 4с- | 6c- | 8c- | 8с- | ec- | ec- | ec- | 8c- | pcounter/intro.txt 
3f 5f 5f 5f 5f 5f 5f 71 9f 9f ff ff ff 9f 


Core clock A 
signal G80 | G84 | G86 |G92 | G94 | G96 |098 | G200 | MCP77 | МСР79 | I 
TPC.GEOM.MUX 10-16 | 00-06 | 00-06 | 00-06 | 00-06 | 00-06 | 00-06 | ?? 00-06 00-06 | 
ZCULL.??? 20-25 | 07-0c | 07-0c | 07-0c | 07-0с | 07-0c | 07-0с | 07-0c | ?? 2? 


496 Chapter 2. nVidia hardware documentation 


nVidia Hardware Documentation, Release git 


Table 20 — continued from previous page 


signal G80 | 684 | 086 | G92 | G94 | 996 | 998 | G200 | MCP77 | MCP79 | | 
TPC.RAST.??? 2? 19 19 19 19 19 2? 2? 2? 2? | 
ТРС.КА$Т.??? 2? 1а 1а 1а 1а 1а 99 99 99 99 
PREGEOM.??? 2? 2? 2? 2? 2? 2? 99 99 99 99 
PREGEOM.??? 2? 2? 2? 2? 2? 2? 99 99 99 99 
POSTGEOM.??? 2? 2? 2? 2? 2? 2? 99 99 99 99 
POSTGEOM.??? 2? 2? 2? 2? 2? 2? 99 99 99 99 
КАТТЕ.??? 99 99 99 99 99 99 99 22 99 99 
APLANE.CG - 31-33 | 31-33 | 31-33 | 31-33 | 31-33 | 31-33 | 92 39-3b 39-3b | 
RATTR.CG - 37-39 | 37-39 | 37-39 | 37-39 | 37-39 | 37-39 | 72 43-45 43-45 
ZCULL.??? 2? 2? 4f 4f 4f 4f 4f ?? 99 99 | 
VFETCH.MUX 26-3f | 66-7f | 66-7f | 66-7f | 66-7f | 66-7f | 66-7f | 46-5f | 46-5f 46-5f 
TPC.RAST.CG - ?? 2? 2? 2? 2? 99 92 99 99 | 
PCOUNTER.USER - Е = - E - = - = - | 
ZCULL.??? 6e 2? 2? 2? 2? 2? 2? 99 99 99 | 
ZCULL.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 75 
ZCULL.??? 2? 2? 2? 2? 2? 2? 99 99 99 99 
APLANE.CG ІҒАСЕ DISABLE 73 - - - - - - - - - 
VATTR.??? 77-19 |99 2? 99 99 99 99 99 99 99 
VATTR.??? 2? 57 ?? 57 57 57 57 2? 7а 2? 
VATTR.??? 2? 59 2? 59 59 59 59 2? 7f ?? 
VATTR.??? 7с 5с 5с 5с 5с 5с 5с 82 2? 22 | 
VATTR.??? 7а 54 54 54 54 54 54 83 2? 99 | 
VATTR.CG ІҒАСЕ DISABLE 7е - - - - - - - - - 
STRMOUT.??? 7f 5e 5e 5e 5e 5e 5e 84 2? 99 | 
STRMOUT.??? 80 5f 5f 5f 5f 5f 5f 85 2? 99 | 
STRMOUT.??? 81 ?? ?? 2? 2? 2? 2? 99 99 99 | 
STRMOUT.??? 2? 2? 2? 2? 2? 2? 2? 2? 85 2? 
CLIPID.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 8а 
CLIPID.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 8с 
RMASK.??? ЭЭ 2? 2? 2? 2? 2? 2? 2? 8e 2? 
STRMOUT.CG IFACE DISABLE | 82 - - - - - - - - - 
TPC.GEOM.??? 8d 85 85 85 85 85 85 2? 2? 91 ; 
TPC.GEOM.??? 8f 87 87 87 87 87 87 ?? ?? 93 ; 
TPC.GEOM.??? 91 89 89 89 89 89 89 2? 2? 95 | 
TPC.GEOM.??? 93 8b 8b 8b 8b 8b 8b 2? 2? 97 | 
TPC.GEOM.??? 2? 2? 2? 2? 2? 2? 2? 2? 91 2? | 
TPC.GEOM.??? 2? 2? 2? 2? 2? 2? 2? 2? 93 2? 
TPC.GEOM.??? 2? 2? 2? 2? 2? 2? 2? 2? 95 2? 
RATTR.CG IFACE DISABLE 95 - 
RATTR.??? 96 2? 27, 2? 2? 2? 2? 2? 2? 2? 
RATTR.??? 97 2? 2? 2? 2? 2? 2? 2? 2? 2? 
RATTR.??? 98 ?? ?? 2? 2? 2? 2? 2? 2? 2? 
RATTR.??? 99 2? 2? 2? 2? 2? 2? 2? 2? 2? 
RATTR.??? 2? 84 84 84 84 84 84 2? 97 2? 
ТРС.КА$Т.??? 9b 92 92 92 92 92 92 2? Эс 9e | 
ТРС.КА$Т.??? 94 94 94 94 94 94 94 2? 9e a0 í 
ENG2D.??? ?? 2? 9b 9b 9b 9b 9b 2? 2? а7 | 
ENG2D.??? 2? 2? 94 94 94 94 94 2? 2? a9 
ENG2D.CG_IFACE_DISABLE a7 - 
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Table 20 — continued from previous page 


signal G80 G84 G86 G92 G94 G96 G98 G200 | МСР77 | MCP79 | | 
992 ае а4 а4 а4 а4 а4 а4 bO ?? ?? | 
VCLIP.??? 58 ае 2? ас ае ае ае 2? 58 ba | 
VCLIP.??? ba bO ?? bO bO bO bO ?? Ба bc 
VCLIP.CG IFACE DISABLE bb z 
DISPATCH.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 2? 
PGRAPH.IDLE c8 bd bd bd bd bd bd c9 2? с9 | 
PGRAPH.INTR ca bf bf bf bf bf bf cb 2? cb | 
CTXCTL.USER 42-45 | c7-ca | c7-ca | c7-ca | c7-ca | с7-са | c7-ca | 43-46 | 41-44 d3-d6 | 
TRAST.??? dc d2 d2 d2 d2 d2 d2 de 2? 2? | 
TRAST.??? dd d3 d3 d3 d3 d3 d3 df 2? 2? | 
TRAST.??? de d4 d4 d4 d4 d4 d4 е0 99 2? | 
TRAST.??? df d5 d5 d5 d5 d5 d5 el 2? 2? | 
TRAST.??? e2 48 48 48 48 48 48 е4 2? 2? | 
TRAST.??? e3 d9 d9 d9 d9 d9 d9 e5 e3 e5 | 
TRAST.??? е5 db db db db db db ?? е5 е7 | 
TRAST.CG IFACE DISABLE e6 - - - - - - - - - 
PCOUNTER.TRAILER ee-ff ec-ff ec-ff ec-ff ec-ff ec-ff ec-ff ec-ff ec-ff ec-ff | 
Core clock B 
signal G80 G84 G86 G92 G94 G96 G98 G200 | MCP77 | MCP79 | а 
PROP.MUX 00-07 | 00-07 | 00-07 | 00-07 | 00-07 | 00-07 | 00-07 | 00-07 | 00-07 00-07 0С 
PVPE.??? 3a 2? 2? 2? 2? 99 - 2? - - - 
CCACHE.??? 2? 2? 2? 2? 2? 2? 2? 2? ЭЭ 2? 27 
CCACHE.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 2? 27 
ТЕХ.??? 40 1а 1а 1а 1а 1а 1а 32 2? 2? 3a 
TEX.??? 41 1b 10 10 10 10 10 33 2? 2? 3t 
TEX.??? 42 Іс lc 1с 1с 1с 1с 34 2? 2? 3c 
VATTR.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 3с 2? 
VATTR.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? Зе 2? 
STRMOUT.??? 2? 2? 2? 2? 2? 7) 2? 2? 27 46 27 
STRMOUT.??? 2? 2? 2? 2? 2? 2? 2? 2? 2? 48 2 
СВАК.МОХО 4a-Ad | 24-27 | 24-27 | 24-27 | 24-27 | 24-27 | 24-27 | 2? 49-4с 49-4с 51 
CBAR.MUXI 4e-5] | 28-20 | 28-2b | 28-2b | 28-2b | 28-2b | 28-25 | 22 44-50 44-50 52 
CROP.MUX 52-55 | 30-33 | 30-33 | 30-33 | 30-33 | 30-33 | 30-33 | 55-58 | 55-58 55-58 64 
ENG2D.??? 2? 2? 2? 36-37 | 36-37 | 36-37 | 22 2? 2? 2? 2? 
ZBAR.MUX 56-59 | 38-3b | 38-3b | 38-3b | 38-3b | 38-3b | 383b | 2? 68-6b 68-6b 7C 
22? 64 2? 2? 2? 2? 2? 2? 2? 2? 2? 27 
22? 5е 2? 2? 2? 2? 22 2? 2? 2? 2? 27 
9992 64 99 99 2? 2? 2? 2? 2? 2? 2? 27 
22? 68 2? 2? 2? 2? 2? 2? 2? 2? 2? 2 
VCLIP.??? HH 2? 2? 2? 2? 2? 2? ?? 64 2? 2? 
VCLIP.??? 2? 2? 2? 2? 2? 2? 2? ?? 65 2? 2? 
ZROP.MUX 6c-6f | 44-47 | 44-47 | 44-47 | 44-47 | 44-47 | 44-47 | 74-77 | 74-77 74-77 Tc 
TEX.??? 70-73 | 48-4b | 48-4b | 48-4b | 48-4b | 48-4b | 48-4b | 78-7b | 78-7b 78-7b 8( 
PCOUNTER.USER - 9e 
9992 80 99 99 2? 2? 2? 2? 2? 2? 2? 2 
PVPE.??? 89-a6 | ?? 2? 2? 2? 2? - 2? - - - 
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Table 21 — continued from previous p 


signal G80 G84 G86 G92 G94 G96 G98 G200 | MCP77 | MCP79 | G 
PROP??? ab 2? 2? ЭЭ 2? 2? 2? 2? 2? 2? 2? 
MMU.CG. IFACE DISABLE ac - - 
MMU.BIND ad - - 
PFB.CG IFACE DISABLE b8 - - 
PFB.WRITE c3 - - 
PFB.READ c4 - - 
PFB.FLUSH c5 - - 
ZCULL.CG - 58-5а | 58-5a | 58-5а | 58-5a | 58-5a | 58-5а | 22 54-51 54-51 5с 
VATTR.CG 2? 84-86 84-86 8с 
STRMOUT.CG 2? 87-89 87-89 81 
CLIPID.CG ?? 8a-8c 8a-8c 92 
ENG2D.CG - 60-62 | 60-62 | 60-62 | 60-62 | 60-62 | 60-62 | 22 84-81 84-81 9: 
VCLIP.CG 2? 90-92 90-92 9t 
RMASK.CG 2? 93-95 93-95 ас 
TRAST.CG - 63-65 | 63-65 | 63-65 | 63-65 | 63-65 | 63-65 | 22 96-98 96-98 a3 
TEX.CG - 66-68 | 66-68 | 66-68 | 66-68 | 66-68 | 66-68 | 22 99-95 99-9b ać 
TEX.CG_IFACE_DISABLE dd - - 
TEX.UNK6.??? df 7а 7а 7а 74 74 75 2? аа аа b7 
CCACHE.CG_IFACE_DISABLE | ea - = 
PSEC.PM_TRIGGER_ALT c4 c4 - 
PSEC.WRCACHE FLUSH ALT c5 c5 - 
PSEC.FALCON c6-d9 c6-d9 - 
PCOUNTER.TRAILER ee-ff 8c-9f | 8с-9Г | 8c-Of | 8с-9Г | 8с-9Ғ | 8c-Of | ec-ff ec-ff ec-ff сс 
Shader clock 
* 0x00-0x03: MPC GROUP 0 
* 0x04-0x07: MPC GROUP 1 
* 0x08-0x0b: MPC GROUP 2 
• Ox0c-Ox0f: MPC GROUP 3 
* [XXX] 
* 0x13-0x14: PCOUNTER.USER [GT215:] 
* 0x2e-0x3f: PCOUNTER.TRAILER [G80] 
* 0x2c-0x3f: PCOUNTER.TRAILER [G84:] 
Memory clock 
MCP7Xx don't have this set. МСР89 does. 
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signal G80| G84| G86| G92| G94| (96 G98| G200 бТ215312163Т218УЛСРВ4 оси- 
menta- 
tion 
PFB.UNK6.CG ЇРАОаН DISABLE 
PFB.UNK6.CG - 14- | 14 | 14 | 14- | 14- | 14- | ?? la- la- | la- | ? 
16 16 16 16 16 16 lc lc lc 
PCOUNTER,USER 3b- | 3b- | 37- | 6a- pcounter/ihtro.txt 


3c 3c 38 6b 


PCOUNTER.TRAILBR- | 4c- | 4c- | 4c- | 4с- | 4с- | 4с- | 6c- | 6c- | 6c- | 6c- | ec- pcounter/intro.txt 
3f 5f 5f 5f 5f 5f 5f 71 71 7f 71 ff 
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txt 
txt 
txt 
txt 
txt 


Core clock C 
signal G84] G86| G92| G94| G96| G98| G200 MCP7MICP7GT21 33121 8312] 8MCP88ocumen- 
tation 
PBSP.USER?? 2? - 2? 2? - 00- [also on 
07 core clock 
D] 
PVP2.USHR?? 2? - 2? 2? - 08- |а180 оп 
Of core clock 
D] 
VCLIP.???| 20 20 20 20 20 20 2? 2? 2? 2? 2? 2? 2? 
УСШР.222| 21 21 21 21 21 21 2? 2? 2? 92 2? 2? 2? 
VATTR.CG 24- | 24- | 24- | 24- | 24- | 24 | 72 [also on 
26 26 26 26 26 26 core B] 
STR- 27- | 27- | 27- | 27- | 27- | 27- | 2? [also on 
MOUT.CG 29 29 29 29 29 29 core B] 
VCLIP.CG 2a- | 2a- | 2a- | 2a- | 2a- | 2a- | ?? [also on 
2c 2c 2c 2c 2c 2c core В] 
VUC IDLE?? 2? 2? 2? 2? - 34 vdec/vuc/perf. 
VUC SLEEP? 2? 2? 2? 2? - 36 vdec/vuc/perf. 
УОС WATCHDOG? 2? 2? 2? - 38 vdec/vuc/perf. 
VUC 05НК?РО1,.5 2? 2? 2? - 39 vdec/vuc/perf. 
VUC_USER? CONT? 2? 2? 2? - За vdec/vuc/perf. 
PSEC.PM |TRIGGER ALT - - 37 [this and 
other PSEC 
stuff on core 
clock B on 
MCP*] 
PSEC.WRCACHE -FLUSH_ALF - 38 
PSEC.FALCON 39- 
4c 
PCOUNTHR-USER- 10- | 10- | 10- | 10- | pcounter/intro. 
11 11 11 11 
РСОРҮ.РМ -TRIGGER | ALT 14 14 14 14 
PCOPY.WRCACHE FLUSH АНТ le le le le 
PCOPY.FALCON 1f- | 1f- | If- | 1f- | fal- 
32 |32 |32 |32 | con/perf.txt 
PDAE- 3e 3e 3e 3e 
MON.PM |TRIGGER, ALT 
PDAE- 3f 3f 3f 3f 
MON.WRCACHE FLUSH, ALT 
PDAE- 40- | 40- | 40- | 40- | fal- 
MON.FALCON 53 53 53 53 con/perf.txt 
РСООМТЕМЕКАЊЕК | 4c- | 4c- | 4c- | 6c- | 6c- | Oc- | Oc- | 6c- | 6c- | 6c- | 6c- | pcounter/intro, 
5f 5f 5f 5f ЭГ 7ї 7f lf If 7f 7f 7f 7f 
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Vdec clock (VP2) 


signal G84 | G86 | G92 | G94 | G96 | G200 | documentation 
PVP2 USER 0 27 9? 00-07 | 22 9? 00-07 | vdec/vp2/intro.txt 
PVP2.CG IFACE DISABLE | 28 28 28 28 r28 ?? what? 
PCOUNTER.TRAILER ac-bf | ac-bf | ac-bf | ac-bf | ac-bf | ac-bf | pcounter/intro.txt 
Vdec clock (VP3/VP4) 
signal G98 | MCP77 | MCP79 | GT215 | GT216 | GT218 | MCP89 | documenta- 
tion 
PCOUNTER.USER - - - 10-11 10-11 10-11 10-11 pcounter/intro.txt 
PVLD.FALCON 10- 10-23 10-23 16-29 16-29 16-29 16-29 falcon/perf.txt 
23 
PPPP.FALCON 40- 40-53 40-53 2a-3d 2a-3d 2a-3d 2a-3d falcon/perf.txt 
53 
VUC_IDLE 5d 2? 2? 2? 88 2? 2? vdec/vuc/perf.txt 
VUC SLEEP 5e 2? 2? 27 89 2? 2? vdec/vuc/perf.txt 
VUC WATCHDOG 5f 2? 2? 22 8a ?? ?? vdec/vuc/perf.txt 
VUC USER. CONT 60 2? 2? 2? 80 2? 2? vdec/vuc/perf.txt 
VUC_USER_PULSE | 61 9? 92 2? 8с 2? 7? vdec/vuc/perf.txt 
PPDEC.FALCON 8e- 8e-al 8e-al 3e-51 3e-51 3e-51 3e-51 falcon/perf.txt 
al 
PVCOMP.FALCON - - - - - - 52-65 falcon/perf.txt 
PVLD.??? 2? 2? 2? 2? 54-58 2? 9? 
PPPP.??? 2? 7? 2? 2? 5f-7e 2? 7? 
PPDEC.XFRM.??? 2? 7? 2? 2? а0-а4 2? 7? 
PPDEC.UNK580.??? 2? 7? 2? 2? ad-af 2? 2? 
PPDEC.UNK680.??? 2? 9? 97 2? 56 2? 2? 
PVLD.CRYPT.??? 2? 7? 2? 2? с0-с5 97 2? 
PCOUNTER.TRAILER ac-bf | ac-bf ac-bf cc-df cc-df cc-df ec-ff pcounter/intro.txt 
Core clock D 
signal G84 | G86 | G92 | G94 | 696 | 098 | G200 | MCP77 | MCP79 | GT21* 
PBSP.USER 9? 9? 00-07 | ?? 2? - - - - - 
PVP2.USER 2? 2? 08-0Ғ | 22 2? - - - - - 
PFB.CG 10-12 | 10-12 | 10-12 | 10-12 | 10-12 | 00-02 | 22 00-02 00-02 00-02 
22? 7? 77 97 ?? ?? 07 ?? 2? 2? 2? 
MMU.CG 3a-3c | 3a-3c | За-3с | 3a-3c | 3a-3c | 14-18 | ?? 24-26 24-26 14-1 
PBSP.CG 55-54 | 3d-3f | 63-65 | 50-54 | 5b-5d | - 2? - - - 
22? 7? 7? 2? ?? ?? 22 ?? 2? 27 2? 
22? 7? 7? 2? 2? 2? 23 2? 2? 7? 7? 
22? 7? 7? 2? 2? 7? 24 7? 7? 7? 2? 
22? 2? 7? 2? 2? 2? 2c 7? 7? 7? 2? 
22? 2? 7? 2? 7? 7? 2e 7? 7? 7? 2? 
22? 72 7? 2? 2? 7? 30 7? 7? 7? 7? 
2M 7? 22 7? 7? 7? 32 7? 7? 7? 7? 
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Table 22 — continued from previous 


signal G84 | G86 | 692 | G94 | 696 | G98 | G200 | MCP77 | MCP79 | GT21£ 
PCOUNTER.USER - 4f-50 
MMU.BIND 29 5a ?? 2? 2? 34 9? 32 32 54 
РЕВ WRITE 2? 6f ?? 22 2? 4 75 40 40 7а 
PFB READ 2? 70 2? 22 2? 4с 76 41 41 7е 
РЕВ FLUSH 9? 71 2? 2? 2? 44 77 42 42 71 
PVLD.PM TRIGGER ALT 65 - 6d 6f 9a 
PVLD.WRCACHE FLUSH ALT 66 - 6e 70 9b 
РРРРРМ TRIGGER ALT 71 - 79 70 a7 
РРРР.УУКСАСНЕ FLUSH ALT - - - - - 72 - 7а 7с а8 
PPDEC.PM TRIGGER ALT - - - - - 8с - 94 96 54 
PPDEC.WRCACHE FLUSH ALT - - - - - 84 - 95 97 55 


РУСОМРРМ TRIGGER ALT - - - - - - - = - = 
РУСОМР.УУКСАСНЕ, FLUSH AIT | - - - = = - = = " " 


IREDIR, STATUS - - = = = = = = = сб 
IREDIR, HOST КЕО - - - - - 2 2 LI. = с7 
IREDIR, TRIGGER, DAEMON - = - = = = = — = с8 
IREDIR, TRIGGER, HOST - = = = т = = = = с9 
IREDIR, PMC = - = = - Е - - = са 
IREDIR, INTR = = = - E 2 - нэ = cb 
MMIO_BUSY - - E = = = = = E сс 
MMIO Ш.Е m m E = БР = 2 _ = cd 
MMIO_DISABLED - - = = = = = = E ce 
ТОКЕМ ALL USED - - 2 = = E = = = cf 
TOKEN_NONE_USED = = = = = = = = = 40 
ТОКЕМ_ЕКЕЕ - = - = Е - = = = di 
TOKEN. ALLOC - = 2 = - ы = E = d2 
FIFO PUT 0 WRITE - - = — — = - - _ 43 
FIFO PUT 1 WRITE = - = = m m Ей 2 E 44 
ЕІЕО PUT 2 WRITE m - = - = E = = = d5 
FIFO_PUT_3_WRITE - - Е = = m = = _ аб 
INPUT CHANGE - - = = = = Е E 2 47 
ООТРОТ 2 m - - - = = ER = Z d8 
INPUT_2 28 49 
ТНЕКМ ACCESS BUSY - da 
PCOUNTER.TRAILER ec-ff cc-df | ec-ff ec-ff ec-ff ac-bf | 8c-9f | ac-bf ac-bf ec-ff 


2.12.4 Fermi+ signals 


Contents 


e Fermi+ signals 
- GFI00 
- СЕ116 signals 


Todo: convert 
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GF100 


HUB domain 2: 

* source 0: CTXCTL 

— 0х18: 222 

- 0х1Ы: 222 

- 0х22-0х27: CTXCTL.USER 
e source 1: 297 

- Ox2e-0x2f: 22? 

НОВ domain 6: 

* source 1: DISPATCH 

— 0х01-0х06: DISPATCH.MUX 
* source 8: CCACHE 

- 0x08-0x0f: CCACHE.MUX 
* source 4: UNK6000 

— 0x28-0x2f: UNK6000.MUX 
* source 2: 

- 0x36: 222 
* source 5: UNK5900 

— 0x39-0x3c: UNK5900.MUX 
* source 7: UNK7800 

— 0x42: UNK7800.MUX 
* source 0: UNK5800 

- 0х44-0х47: UNK5800.MUX 


* source 6: 
- Ox4c: 22? 
GPC domain О: 
* source Ox16: 
- 0x02-0x09: GPC.TPC.L1.MUX 
* source 0x19: TEX.MUX C D 
- 0x0a-0x12: GPC. TPC.TEX.MUX C D 
* source 0: CCACHE.MUX A 
- 0х15-0х19: GPC.CCACHE.MUX A 
* source 5: 
- Oxla-Ox1f: GPC.UNKCO0.MUX 
* source Ox14: 


— 0х21-0х28: GPC.TPC.UNK400.MUX 
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source Ox17: 

— 0x31-0x38: GPC.TPC.MP.MUX 
source 0x13: TPC.UNK500 

- 0x3a-0x3c: TPC.UNK500.MUX 
source Оха: PROP 

- 0х40-0х47: GPC.PROP. MUX 
source 0х15: POLY 

- 0х48-0х44: POLY. MUX 
source 0x11: FFB.MUX В 

- Ox4f-0x53: GPC.FFB.MUX B 
source Охе: ESETUP 

- 0х54-0х57: GPC.ESETUP. MUX 
source Ox la: 

- 0х50-0х5е: GPC.TPC.TEX.MUX_A 
source 0x18: 

- 0х61-0х64: GPC.TPC.TEX.MUX_B 
source Oxb: UNKBOO 

— 0x66-0x68: GPC.UNKB00.MUX 
source Охс: UNK600 

- Охба: GPC.UNK600.MUX 
source 3: 222 

- Охбе: ??? 
source 8: FFB.MUX A 

- 0x72: ??? 

- 0х74: ??? 
source 4: 

— 0x76-0x78: GPC.UNKD00.MUX 
source 6: 

- Ox7c-0x7f: GPC.UNKC80.MUX 
source Оха: UNK380 

- 0х81-0х83: GPC.UNK380.MUX 
source Ox12: 

- 0х84-0х87: GPC.UNKE00.MUX 
source Oxf: UNK700 

- 0x88-0x8b: GPC.UNK700.MUX 
source 1: CCACHE.MUX_B 
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- 0х8е: GPC.CCACHE.MUX В 


source Ох Їс: 

- 0х91-0х93: GPC.UNKF00.MUX 
source 0x10: UNK680 

— 0x95: GPC.UNK680.MUX 
source Ox1b: TPC. UNK300 

— 0x98-0x9b: MUX 
source 2: GPC.CTXCTL 

- Ох9с: ??? 

- Оха1-Оха2: GPC.CTXCTL.TA 

- Oxaf-0xba: GPC.CTXCTL.USER 


source 9: ??? 
- Oxbf: ??? 
PART domain 1: 
* source 1: СКОРМОХ А 
- 0x00-0x0f: CRORMUX А 
* source 2: СКОРМОХ B 
- 0х10-0х16: СКОРМОХ В 
* source 3: ZROP 
- 0х18-0хіс: ZRORMUX А 
— 0x23: ?КОРМОХ В 
e source 0: 22? 


- 0х27: 292 


GF116 signals 


[XXX: figure out what the fuck is going on] 
HUB domain 0: 
* source 0: ??? 
e source 1: 222 
- 0х01-0х02: 222 
НОВ domain 1: 
* source 0: ??? 
— 0x00-0x02: ??? 
e source 1: 292 
* source 2: 222 


— 0х13-0х14: 222 
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source 3: ??? 
— 0x16: ??? 
HUB domain 2: 
source 0: CTXCTL [?] 
— 0x18: CTXCTL 222 
- 0x22-0x25: CTXCTL USER 0..05ЕК 5 


source 1: ??? 
- Ox2e-Ox2f: ??? 
* 2: PDAEMON 
- 0х14,0х15: PDAEMON PM SEL 2,3 
- 0х2с: PDAEMON PM SEL 0 
- 0х24: PDAEMON PM SEL 1 
— 0x30: PDAEMON ??? 
HUB domain 3: 
source 0: PCOPY[0].??? 
— 0x00: 222 
— 0x02: ??? 
— 0x38: PCOPY[0].SRCO ??? 
source 1: PCOPY[0].FALCON 
— 0х17,0х18: PM SEL 2,3 
- 0х2е: PCOPY[0].FALCON ??? 
— 0x39: PCOPY[0].FALCON ??? 
source 2: PCOPY[0].??? 
- 0x12: ??? 
- ОхЗа: PCOPY[0].SRC2 222 
source 3: PCOPY[1].??? 
— 0x05-0x07: ??? 
- Ox3b: PCOPY[1].SRC3 ??? 
source 4: PCOPY[1].FALCON 
- 0х19,0х1а: РМ SEL 2,3 
- 0x34: PCOPY[I].FALCON 222 
- Ох3с: PCOPY[1].FALCON 292 
source 5: PCOPY[1].??? 
- 0x14: 222 
- 0x16: 292 


- 0х34: PCOPY[1].SRC5 292 
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* source 6: PPDEC.??? 


0хОс: 222 
0x22: 992 
0х24: 992 


Ox3e: 222 


* source 7: PPPP.??? 


Ох0а: 222 
0х14: 222 
Ox1f: 222 


Ox3f: 222 


* source 8: PVLD.??? 


0хОе-Ох 10: 222 
0х27: 222 
0x29: 22? 
0x40: 22? 


НОВ 4ошай 4: 


0: РРОЕС.??? 

1: PPDEC.FALCON 
2: PPPP.??? 

3: PPPP. FALCON 
4: PVLD.??? 

5: PVLD.FALCON 


HUB domain 4 signals: 


0х00-0х03: РРРР.5КС2 222 

0х06-0х07: РРРЕС.5КСО 222 

0x09: PVLD.SRCA 222 

Ox0b: PVLD.SRC4 222 

Ох0с,0х0а: PPPP.FALCON PM SEL 2,3 
Ox0e,0x0f: PPDEC.FALCON PM SEL 2,3 
0х10,0х 11: PVLD.FALCON PM SEL 2,3 
0х16-0х17: PPPP.FALCON ??? 
0х1с-Ох 14: PPDEC.FALCON 222 

Oxle: PVLD.FALCON 222 

0x24-0x25: PPDEC.SRCO ??? 

0x26: PPDEC.FALCON ??? 

0x27: РРРР.5КС2 222 
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* 0x28: PPPP.FALCON ??? 

* 0x29: PVLD.SRCA ??? 

* 0x2a: PVLD.FALCON ??? 
HUB domain 5 sources: 

0:99? 
НОВ domain 5 signals: 

* 0x00: SRCO ??? 

* 0x05-0x06: SRCO ??? 

* 0x09: SRCO ??? 

* 0хОс: SRCO 22? 
HUB domain 6 sources: 

• 0: 27? 

• 1: 27? 

• 2: 27 
22? 
22? 
22? 
22? 


(1 
гї © 27» 18-02 


22? 
• 8: 777 
НОВ domain 6 signals: 
e Ox0a-0x0b: SRC8 222 
* 0x36: SRC2 ??? 
* 0x39: SRC5 ??? 
* 0x45: SRCO ??? 
* 0x47: SRCO ??? 
* 0х4с: SRC6 222 


2.13 Display subsystem 
Contents: 


2.13.1 NV1 display subsystem 


Contents: 


2.13. Display subsystem 
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2.13.2 NV3:G80 display subsystem 


Contents: 


VGA stack 


Contents 


* УСА stack 
— Introduction 


— MMIO registers 


Description 


— Stack access registers 


- Internal operation 


Introduction 


A dedicated RAM made of 0x200 8-bit cells arranged into a hw stack. NFI what it is for, apparently related to VGA. 


Present on NV41+ cards. 


MMIO registers 


On NV41:G80, the registers are located in PBUS area: 
* 001380 VAL 
* 001384 CTRL 
* 001388 CONFIG 
* 00138c SP 
They are also aliased in the VGA CRTC register space: 
* CR90 VAL 
e CR91 CTRL 
On G80+, the registers are located in PDISPLAY. VGA area: 
* 619e40 VAL 
* 619e44 CTRL 
* 619e48 CONFIG 
* 619e4c SP 


And aliased in УСА CRTC register space, but in a different place: 


* CRA2 VAL 
* СКАЗ CTRL 
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Description 


The stack is made of the following data: 
* an array of 0x200 bytes [the actual stack] 
* a write shadow byte, WVAL [G80- only] 
* aread shadow byte, RVAL [G80+ only] 
* a 10-bit stack pointer [SP] 


* 3 config bits: - push mode: auto or manual - pop mode: auto or manual - manual pop mode: read before pop or 
read after pop 


* 2 sticky error bits: - stack underflow - stack overflow 


The stack grows upwards. The stack pointer points to the cell that would be written by a push. The valid values for 
stack pointer are thus 0-0x200, with O corresponding to an empty stack and 0x200 to a full stack. If stack is ever 
accessed at position >= 0х200 [which is usually an error], the address wraps modulo 0x200. 


There are two major modes the stack can be operated in: auto mode and manual mode. The mode settings are 
independent for push and pop accesses - one can use automatic pushes and manual pops, for example. In automatic 
mode, the read/write access to the VAL register automatically performs the push/pop operation. In manual mode, the 
push/pop needs to be manually triggered in addition to accessing the VAL reg. For manual pushes, the push should be 
triggered after writing the value. For pops, the pop should be triggered before or after reading the value, depending on 
selected manual pop mode. 


The stack also keeps track of overflow and underflow errors. On NV41:G80, while these error conditions are detected, 
the offending access is still executed [and the stack pointer wraps]. On G80+, the offending access is discarded. The 
error status is sticky. On NV41:G80, it can only be cleared by poking the CONFIG register clear bits. On G80+, the 
overflow status is cleared by executing a pop, and the underflow status is cleared by executing a push. 


Stack access registers 


The stack data is read or written through the VAL register: 
MMIO 0x001380 / CR 0x90: VAL [NV41:G80] 


MMIO 0x619e40 / CR 0xa2: VAL [G80-] Accesses a stack entry. A write to this register stored the low 8 bits of 
written data as a byte to be pushed. If automatic push mode is set, the value is pushed immediately. Otherwise, 
it is pushed after PUSH. TRIGGER is set. A read from this register returns popped data [causing a pop in the 
process if automatic pop mode is set]. If manual read-before-pop mode is in use, the returned byte is the byte 
that the next POP TRIGGER would pop. In manual pop-before-read, it is the byte that the last POP TRIGGER 


popped. 
The CTRL register is used to manually push/pop the stack and check its status: 
MMIO 0x001384 / CR 0x91: CTRL [NV41:G80] 
MMIO 0x619e44 / СК 0xa3: CTRL [G80-] 
• bit 0: PUSH. TRIGGER - when written as 1, executes a push. Always reads ав 0. 
e bit 1: POP. TRIGGER - like above, for pop. 
* bit 4: EMPTY - read-only, reads as 1 when SP == 0. 
* bit 5: FULL - read-only, reads as 1 when SP >= 0x200. 
• bit 6: OVERFLOW - read-only, the sticky overflow error bit 
* bit 7: UNDERFLOW - read-only, the sticky underflow error bit 
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— Stack configuration registers — 
To configure the stack, the CONFIG register is used: 
MMIO 0x001388: CONFIG [NV41:G80] 
ММПО 0x619e48: CONFIG [G80-] 
e bit 0: PUSH MODE - selects push mode [see above] 
- 0: MANUAL 
- 1: AUTO 
bit 1: РОР MODE - selects pop mode [see above] 
- 0: MANUAL 
- 1: АОТО 


bit 2: MANUAL POP MODE - for manual pop mode, selects manual pop submode. Unused for auto pop 
mode. 


— 0: POP READ - pop before read 
- 1: READ POP - read before pop 


bit 6: OVERFLOW CLEAR [NV41:G80] - when written as 1, clears CTRL.OVERFLOW to 0. Always 
reads as О. 


bit 7: UNDERFLOW CLEAR [NV41:G80] - like above, for CTRL.UNDERFLOW 


The stack pointer can be accessed directly by the SP register: 
MMIO 0x00138c: SP [NV41:G80] 
MMIO 0x619e4c: SP [G80-] The stack pointer. Only low 10 bits are valid. 


Internal operation 


NV41:G80 VAL write: 


if (SP >= 0x200) 
CTRL.OVERFLOW = 1; 

STACK[SP] = val; 

if (CONFIG.PUSH_MODE == AUTO) 
PUSH (); 


NV41:G80 PUSH: 


SP++; 


NV41:G80 VAL read: 


if (SP == 0) 
CTRL.UNDERFLOW = 1; 
if (CONFIG.POP_MODE == AUTO) { 
POP (); 
res = STACK[SP]; 
} else { 
if (CONFIG.MANUAL POP MODE == POP READ) 


res = STACK[SP]; 


(continues on next page) 
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(continued from previous page) 


else 
res = STACK[SP-1]; 


NV41:G80 РОР: 


oP--: 


G80+ VAL write: 


WVAL = val; 
if (CONFIG.PUSH_MODE == AUTO) 
PUSH (); 


G80+ PUSH: 


if (SP >= 0x200) 
CTRL.OVERFLOW = 1; 


else 


STACK[SP++] = WVAL; 
CTRL.UNDERFLOW = 0; 


G80+ VAL read: 
if (CONFIG.POP_MODE == AUTO) { 
РОР (); 
res = RVAL; 
} else { 
if (CONFIG.MANUAL POP MODE == POP READ || SP == 0) 
res = RVAL; 
else 
res = STACK[SP-1]; 
} 
С80+ POP: 
if (SP == 0) 


CTRL.UNDERFLOW = 1; 


else 


RVAL = STACK[--SP]; 
CTRL.OVERFLOW = 0; 


2.13.3 G80 display subsystem 


Contents: 


PDISPLAY’s monitoring engine 


Contents 


* PDISPLAY’s monitoring engine 


— Introduction 
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- falcon parameters 


— MMIO registers 


Todo: write me 


Introduction 


Todo: write me 


falcon parameters 


Present on: 
v0: GF119:GK104 
v1: GK104:GK110 
v2: ОК110+ 
BARO address: 0x627000 
PMC interrupt line: 26 [shared with the rest of PDISPLAY], also INTR HOST SUMMARY bit 8 
PMC enable bit: 30 [all of PDISPLAY] 
Version: 
v0,v1: 4 
v2: 4.1 
Code segment size: 0x4000 
Data segment size: 0x2000 
Fifo size: 3 
Xfer slots: 8 
Secretful: no 
Code TLB index bits: 8 
Code ports: 1 
Data ports: 4 
Version 4 unknown caps: 31, 27 
Unified address space: no 
IO adressing type: full 
Core clock: ??? 
Fermi VM engine: none 


Fermi VM client: HUB 0x03 [shared with rest of PDISPLAY] 
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Interrupts: 
Line | Type | Present on | Name Description 
12 level | all PDISPLAY | DISPLAY DAEMON-routed interrupt 
13 level | all FIFO 
14 level | all 22? 520? 524 apparently not required 
15 level | У1- PNVIO DISPLAY DAEMON-routed interrupt, but also 554? 
Status bits: 
Bit | Name Description 
FALCON | Falcon unit 
1 МЕМЕ Memory interface 
IO registers: MMIO registers 
Todo: more interrupts? 
Todo: interrupt refs 
Todo: MEMIF interrupts 
Todo: determine core clock 
MMIO registers 
Address Present on | Name Description 
0x627000:0x627400 | all N/A Falcon registers 
0x627400 all 22? [alias of 610018] 
0х627440+1*4 ай FIFO PUT 
0х627450+1*4 all FIFO GET 
0x627460 all FIFO INTR 
0x627464 all FIFO INTR EN 
0x627470+1*4 all RFIFO_PUT 
0х627480+1*4 ай RFIFO GET 
0x627490 all RFIFO STATUS 
0х6274а0 У1- 22? 18:18 1721313::37/11 
0х627500+1*4 all 22? 
0х627520 у1-? 22? interrupt 14 
0x627524 У1- 927 [O/ffffffff/O] 
0x627550 У1- 22? [27 1 O/fffffff£/O] 
0x627554 У1- 22? interrupt 15 [0/1/0] 
0x627600:0x627680 | all MEMIF Memory interface 
0x627680:0x627700 | all - [alias of 6276004] 
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Todo: refs 


G80 VGA mutexes 


Contents 


* G80 VGA mutexes 
— Introduction 


— MMIO registers 


— Operation 


Introduction 


Dedicated mutex support hardware supporting trylock and unlock operations on 64 mutexes by 2 clients. Present on 
G80- cards. 


MMIO registers 


On G80+, the registers are located in PDISPLAY. VGA area: 
e 619080 MUTEX TRYLOCK A[0] 
e 619084 MUTEX TRYLOCK А[1] 

e 619088 MUTEX UNLOCK A[0] 

e 619e8c MUTEX UNLOCK АП| 

e 6190690 MUTEX TRYLOCK B[0] 

e 6190694 МОТЕХ TRYLOCK В[1] 

e 619698 MUTEX UNLOCK В(0| 


e 619е9с MUTEX UNLOCK В[1] 


Operation 


There are 64 mutexes and 2 clients. The clients are called А and B. Each mutex can be either unlocked, locked by 
A, or locked by B at any given moment. Each of the clients has two register sets: TRYLOCK and UNLOCK. Each 
register set contains two MMIO registers, one controlling mutexes 0-31, the other mutexes 32-63. Bit i of a given 
register corresponds directly to mutex i or 1432. 


Writing a value to the TRYLOCK register will execute a trylock operation on all mutexes whose corresponding bit 
is set to 1. The trylock operation makes an unlocked mutex locked by the requesting client, and does nothing on an 
already locked mutex. 


Writing a value to the UNLOCK register will likewise execute an unlock operation on selected mutexes. The unlock 
operation makes a mutex locked by the requesting client unlocked. It doesn't affect mutexes that are unlocked or 
locked by the other client. 
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Reading a value from either the TRYLOCK or UNLOCK register will return 1 for mutexes locked by the requesting 
client, 0 for unlocked mutexes and mutexes locked by the other client. 


MMIO 0х619е80+1*4, i < 2: MUTEX TRYLOCK A Writing executes the trylock operation as client A, treating 
the written value as a mask of mutexes to lock. Reading returns a mask of mutexes locked by client A. Bit j of 
the value corresponds to mutex 1532-4. 


MMIO 0x619e88+i*4, i « 2: MUTEX UNLOCK A Like MUTEX TRYLOCK, A, but executes the unlock opera- 
tion on write. 


MMIO 0х619е90+1*4, i < 2: MUTEX TRYLOCK B Like MUTEX TRYLOCK А, but for client B. 
MMIO 0х619е98+1*4, i < 2: MUTEX UNLOCK B Like MUTEX UNLOCK А, but for client B. 


Todo: convert glossary 
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CHAPTER 3 


nVidia Resource Manager documentation 


Contents: 


3.1 PMU 


PMU is NVIDIA's firmware for PDAEMON, used for DVFS and several other power-management related functions. 


Contents: 


3.1.1 SEQ Scripting ISA 


Contents 


* SEQ Scripting ISA 
- Introduction 

— SEQ conventions 

ж Stack layout 


* Scratch layout 


Opcodes 

- Memory 

* SET last 

* READ last register 
* WRITE last register 


* SET register(s) 
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* WRITE OUT last value 

* WRITE OUT 

* READ OUT last value 

* READ OUT last register 

ж WRITE OUT TIMESTAMP 
- Arithmetic 

* OR last 

* AND last 

* ADD last 

ж SHIFT-left last 

* AND last value, register 

ж ADD OUT 

* OR last value, register 

* OR OUT last value 

* ADD last value, OUT 

* AND OUT last value 
— Control flow 

* EXIT 

* COMPARE last value 

* BRANCH EQ 

* BRANCH NEQ 

* BRANCH LT 

* BRANCH GT 

* BRANCH 

* COMPARE OUT 


— Miscellaneous 


ж WAIT 

ж WAIT STATUS 

* WAIT BITMASK last 
ж ІКО DISABLE 

ж IRO ENABLE 

ж ЕВ PAUSE/RESUME 


Introduction 


NVIDIA uses PDAEMON for power-management related functions, including DVFS. For this they extended the 
firmware, PMU, with a scripting language called seq. Scripts are uploaded through falcon ааа I/O. 
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SEQ conventions 
Operations are represented as 32-bit opcodes, follwed by 0 or more 32-bit parameters. The opcode is encoded as 
follows: 
* Bit 0-7: operation 
* Bit 31-16: total operation length in 32-bit words (# parameters + 1) 
A script ends with OxO. In the pseudo-code in the rest of this document, the following conventions hold: 
* $13 is reserved as the script program counter, aliased pc 
* op aliases *pc & Oxffff 
* params aliases (*pc & Oxffff0000) >> 16 
* param[] points to the first parameter, the first word after *pc 
e PMU reserves Ох5с bytes on the stack for general usage, starting at sp 0x24 


* scratch[] is a pointer to scratchpad memory from 0x3e0 onward. 


Stack layout 


address Type | Alias Description 

0x00-0x20 | u32[9] Callers $r[0]..$r[8] 

0x24 u32 *packet.data | Pointer to data structure 

Ох2а 116 in words Number of words in the program. 

0х2с u32 зіп end Pointer to the end of the program 

0x30 u32 insn len Length of the currently executed instruction 
0x54 u32 *head vert &(PDISPLAY.HEAD STAT[0]. VERT)--head off 
0x58 u32 head off Offset for current HEAD from PDISPLAY [0] 
0х5с u32 зіп start Pointer to the start of the program 

0x62 ul6 word exit 

0x64 u32 timestamp 


Scratch layout 


Type | Name Description 

18 out words | Size of the out memory section, in 32-bit units 

u24 Unused, padding 

u32 *out start | Pointer to the out memory section 

u8 flag eq 1 if compare val last == param 

u8 flag lt 1 if compare val last « param 

ul6 Unused, padding 

u32 val_last Holds the register last read or written. Can be set manually 

u32 reg_last The value last read or written. Can be set manually 

u32 val_ret Holds a return value written back to sp[80] after successful execution 


Opcodes 


XXX: Gaps are all sorts of exit routines. Not clear how the exit procedure works wrt status propagation. 
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Opcode | Params | Description 

0x00 1 SET last value 

0x01 1 SET last register 

0x02 1 OR last value 

0x03 1 OR last register 

0x04 1 AND last value 

0x05 1 AND last register 

0x06 1 ADD last value 

0x07 1 ADD last register 

0x08 1 SHIFT last value 

0x09 1 SHIFT last register 
0x0a 0 READ last register 
Ox0b 1 READ last register 
Ox0c 1 READ last register 
0х04 0 WRITE last register 
0x0e 1 WRITE last register 
OxOf 1 WRITE last register 
0х10 0 EXIT 

Ox11 0 EXIT 

0х12 0 EXIT 

0x13 1 WAIT 

0х14 2 WAIT STATUS 

0х 15 2 WAIT BITMASK last 
0x16 1 EXIT 

0х17 1 COMPARE last value 
0x18 1 ВКАМСН ЕО 

0х19 1 ВКАМСН МЕО 

Oxla 1 BRANCH LT 

0х10 1 BRANCH СТ 

Oxlc 1 BRANCH 

Охта 0 IRQ DISABLE 

Oxle 0 IRQ_ENABLE 

Ox1f 1 AND last value, register 
0x20 1 FB PAUSE/RESUME 
0x21 2n SET register(s) 

0x22 1 WRITE OUT last value 
0x23 1 WRITE OUT indirect last value 
0x24 2 WRITE OUT 

0x25 2 WRITE OUT indirect 
0x26 1 READ OUT last value 
0x27 1 READ OUT indirect last value 
0x28 1 READ OUT last register 
0x29 1 READ OUT indirect last register 
Ox2a 2 ADD OUT 

0х20 1 COMPARE OUT 

0х2с 1 OR last value, register 
0х24 2 XXX: Display-related 
Ox2e 1 WAIT 

Ox2f 0 EXIT 

0x30 1 OR OUT last value 


Continued on next page 
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Memory 


SET last 


Table 1 — continued from previous page 


Opcode | Params | Description 

0x31 1 OR OUT indirect last value 

0x32 1 AND OUT last value 

0x33 1 AND OUT indirect last value 

0x34 1 WRITE OUT TIMESTAMP 

0x35 1 WRITE OUT TIMESTAMP indirect 
0x38 0 МОР 

Ox3b 1 ADD last value, OUT 

Ox3c 1 ADD last value, OUT indirect 
other 0 EXIT 


Set the last register/value in scratch memory. 


Opcode: 0x00 0x01 
Parameters: 1 


Operation: 


scratch[3 + 


(op & 1)] 


= param[0]; 


READ last register 


Do a read of the last register and/or a register/offset given by parameter 1, and write back to the last value. 


Opcode: 0x0a 0x0b OxOc 


Parameters: 0/1 
Operation: 
reg = 0; 
if (ор == Оха || op == 0хс) 
reg += scratch->reg_last; 
if (ор == Oxb || op == 0хо) 


scratch-»val last = 


reg += param[0]; 


mmrd (кед); 


WRITE last register 


Do a write to the last register and/or a register/offset given by parameter 1 of the last value. 


Opcode: 0x0d ОхОе OxOf 
Parameters: 0/1 


Operation: 
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reg = 0; 
if(op == Oxd || op == Oxf) 

reg += scratch-»reg last; 
if(op == Oxe || op == Oxf) 


reg += param[0]; 


mmwr seq(reg, scratch->val_last); 


SET register(s) 


For each register/value pair, this operation performs a (locked) register write. through 
Opcode: 0x21 
Parameters: 2n for n > 0 


Operation: 


ТВО DISABLE; 

for (i = 0; i < params; i += 2) { 
mmwr_unlocked(param[i],param[it+1l]); 

} 

IRQ ENABLE; 

scratch-»reg last = param[i-2]; 

scratch-»val last = param[i-1]; 


WRITE OUT last value 


Write a word to the OUT memory section, offset by the first parameter. For indirect read, the parameter points to an 
8-bit value describing the offset of the address to write to. 


Opcode: 0x22 0x23 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = Sparam[0].u08; 

if (idx >= out, words.u08) 
exit (pc); 


/* Indirect x/ 
if (op & 0х1) { 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


out start[idx] = scratch-»val last; 
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WRITE OUT 


Write a word to the OUT memory section, offset by the first parameter. For indirect read, the parameter points to an 


8-bit value describing the offset of the address to write to. 
Opcode: 0x24 0x25 
Parameters: 2 


Operation: 


if (!out start) 
exit (pc); 

idx = Sparam[0].u08; 

if (idx >= out_words.u08) 
exit (pc); 


/* Indirect */ 
if (op & 0х1) { 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


out start[idx] = param[1]; 


READ OUT last value 


Read a word from the OUT memory section, into the val last location. Parameter is the offset inside the out page. For 


indirect read, the parameter points to an 8-bit value describing the offset of the read out value. 
Opcode: 0x26 0x27 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = Sparam[0].u08; 

if (idx >= out words.u08) 
exit (pc); 


/ж Indirect x/ 
if (op & 0х1) { 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


scratch-»val last = out_start [idx]; 


READ OUT last register 


Read a word from the OUT memory section, into the reg_last location. Parameter is the offset inside the out page. For 


indirect read, the parameter points to an 8-bit value describing the offset of the read out value. 
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Opcode: 0x28 0x29 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = Sparam[0].u08; 

if (idx >= out words.u08) 
exit (pc); 


/* Indirect x/ 
if (op & 0х1) { 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


} 


scratch-»reg last = out_start [idx]; 


WRITE OUT TIMESTAMP 


Write the current timestamp to the OUT memory section, offset by the first parameter. For indirect read, the parameter 
points to an 8-bit value describing the offset of the address to write to. 


Opcode: 0x34 0x35 
Parameters: 2 


Operation: 


if (!out_start) 
exit (pc); 

idx = Sparam[0].u08; 

if (idx >= out words.u08) 
exit (pc); 


/* Indirect x/ 
if (ор & 0х1) ( 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


} 


call timer read(&value) 
out start[idx] = value; 


Arithmetic 
OR last 
OR the last register/value in scratch memory. 


Opcode: 0x02 0x03 


Parameters: 1 
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Operation: 


scratch[3 + (op & 1)] |» param[0]; 


AND last 


AND the last register/value in scratch memory. 
Opcode: 0x04 0x05 
Parameters: 1 


Operation: 


scratch[3 + (op & 1)] &= param[0]; 


ADD last 


ADD the last register/value in scratch memory. 
Opcode: 0x06 0x07 
Parameters: 1 


Operation: 


scratch[3 + (op в 1)] += param[0]; 


SHIFT-left last 


Shift the last register/value in scratch memory to the left, negative parameter shifts right. 
Opcode: 0x08 0x09 
Parameters: 1 


Operation: 


if(param[0].s08 >= 0) { 
scratch[3 + (op & 1)] <<= sex(Sparam[0].s08); 
break; 

} else { 
scratch[3 + (ор & 1)] >>= -sex(Sparam[0].s08); 
break; 


AND last value, register 


AND the last value with value read from register. 
Opcode: OxIf 
Parameters: 1 


Operation: 
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scratch-»val last &- mmrd(param[0]); 


ADD OUT 


ADD an immediate value to a value in the OUT memory region. 
Opcode: 0x2a 
Parameters: 2 


Operation: 


if (!out start) 
exit (pc); 

idx = param[0]; 

if (idx >= out_len) 
exit (pc); 


out_start [idx] += param[1]; 


OR last value, register 


OR the last value with value read from register 
Opcode: 0x2c 
Parameters: 1 


Operation: 


scratch-»val last |= mmrd(param[0]); 


OR OUT last value 


OR the contents of last val with a value in the OUT memory region. 
Opcode: 0x30 0x31 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = param[0]; 

if (idx >= out_len) 
exit (pc); 


/ж Indirect x/ 
if (op & 0х1) { 
idx = out start[idx]; 
if (idx >= out words.u08) 
exit (pc); 


| 


out start[idx] |= scratch-»val last; 
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ADD last value, OUT 


Add a value in OUT to val last. 
Opcode: 0x3b 0х3с 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = param[0]; 

if (idx >= out_len) 
exit (pc); 


/ж Indirect 
if(!op & 0x1) { 
idx = out_start [idx]; 
if (idx >= out words.u08) 
exit (pc); 
} 


val last += out start[idx]; 


AND OUT last value 


AND the contents of last. val with a value in the OUT memory region. 
Opcode: 0x32 0x33 
Parameters: 1 


Operation: 


if (!out start) 
exit (pc); 

idx = param[0]; 

if (idx >= out_len) 
exit (pc); 


/ж Indirect x/ 
if (op & 0х1) { 
14х = out start[idx]; 
if (14х >= out words.u08) 
exit (pc); 


} 


out_start [idx] &= scratch->val_last; 


Control flow 


EXIT 


Exit 
Opcode: 0x10..0x12 0x16 Ox2f 
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Parameters: 0/1 


Operation: 
if(op -- 0x16) 
exit (param[0].s08); 
else 
exit (-1); 


COMPARE last value 


Compare last value with a parameter. If smaller, set flag lt. If equal, set flag eq. 
Opcode: 0x17 
Parameters: 1 


Operation: 


flag eq = 0; 
flag 1t = 0; 


if(scratch-»val last « param[0]) 
flag 1t - 1; 

else if(scratch-»val last -- param[0]) 
flag eq = 1; 


BRANCH EQ 


When compare resulted in eq flag set, branch to an absolute location in the program. 
Opcode: 0x18 
Parameters: 1 


Operation: 


if(flag еа) 
BRANCH param[0]; 


BRANCH NEQ 


When compare resulted in eq flag unset, branch to an absolute location in the program. 
Opcode: 0x19 
Parameters: 1 


Operation: 


if(!flag eq) 
BRANCH param[0]; 
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BRANCH LT 


When compare resulted in lt flag unset, branch to an absolute location in the program. 
Opcode: Oxla 
Parameters: 1 


Operation: 


if(flag lt) 
BRANCH param[0]; 


BRANCH GT 


When compare resulted in lt and eq flag unset, branch to an absolute location in the program. 
Opcode: 0x1b 
Parameters: 1 


Operation: 


if(!flag lt && !flag eq) 
BRANCH param[0]; 


BRANCH 


Branch to an absolute location in the program. 
Opcode: 0xlc 
Parameters: 1 


Operation: 


target = param[0].s16; 
if (target >= in words) 
exit (target); 


word exit = 5:9.516 
target &= Oxffff; 
target <<= 2; 

pe = in_start + target; 


if(pc >= in end) 
exit (іп end); 


COMPARE OUT 


Compare word in OUT with a parameter. If smaller, set flag lt. If equal, set flag eq. 
Opcode: 0x2b 
Parameters: 1 


Operation: 
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if(lout start) 
exit (pc); 


idx - param[0]; 
if (idx >= out words.u08) 
exit (pc); 


flag eq = 0; 
flag 1t = 0; 


if(out start[idx] « param[1]) 
flag 1t - 1; 

else if(out start[idx] == param[1]) 
flag eq = 1; 


Miscellaneous 


WAIT 


Waits for desired number of nanoseconds, synchronous for 0x2e. 
Opcode: 0x13 0x2e 
Parameters: 1 


Operation: 


if(op == 0х2е) 
mmrd (0); 
call timer wait nf(param[0]); 


WAIT STATUS 


Shifts val_ret left by 1 position, and waits until a status bit is set/unset. Sets flag eq and the LSB of val, ret on success. 
The second parameter contains the timeout.The first parameter encodes the desired status. 


Old blob 


param[0] | Test 
UNKNOWN(0x01) 
IUNKNOWN(0x01) 
FB. PAUSED 

ІЕВ PAUSED 
НЕАр0 VBLANK 
!'HEADO VBLANK 
НЕАПТ VBLANK 
IHEADI VBLANK 
HEADO HBLANK 
!IHEADO HBLANK 
НЕАПТ HBLANK 
IHEADI HBLANK 


=| =| WO} co] NY DI GT But. - © 


— o 


New blob 
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In newer blobs (like 337.25), bit 16 encodes negation. 
chooses the HEAD. 


Bit 8:10 the status type to wait for, and where applicable bit O 


param[0] | Test 

0x0 HEADO VBLANK 
0х1 НЕАПТ VBLANK 
0х100 HEADO HBLANK 
Ox101 НЕАПТ HBLANK 
0x300 FB. PAUSED 

0x400 PGRAPH  IDLE 
0x10000 !'HEADO VBLANK 
0х10001 IHEADI VBLANK 
0х10100 IHEADO HBLANK 
0х10101 !IHEADI HBLANK 
0x10300 ІЕВ PAUSED 

0х 10400 IPGRAPH IDLE 


Todo: Why isn't flag eq unset on failure? Find out switching point from old to new format? 


Opcode: 0x14 


Parameters: 2 


Operation OLD BLOB: 
val ret х= 2; 
test params[1] = param[0] & 1; 
test params[2] = I[0x7c4]; 
switch ((param[0] & -1) - 2) 
default: 
test params[0] = 0x01; 
break; 
case 0: 
test params[0] = 0x04; 
break; 
case 2: 
test params[0] = 0x08; 
break; 
case 4: 
test params[0] = 0x20; 
break; 
case 6: 
test params[0] = 0x10; 
break; 
case 8: 
test params[0] = 0x40; 
break; 
} 
if (call timer wait(&input bittest, test params, param[1])) 4 
flag eq = 1; 
val ret |- 1; 
} 
Operation NEW ВГ.ОВ: 
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b32 func(b32 х) xf; 
unk3ec[2] <<- 1; 
test params[2] = 0x1f100; // 7с4 
test params[1] = (param[0] >> 16) & 0х1; 
Switch(param[0] 8 Oxffff) 1 
case 0x0: 
test params[0] = 0x8; 
f = &input test 
break; 
case 0х1: 
test params[0] = 0x20; 
f = &input test 
break; 
case 0x100: 
test params[0] = 0x10; 
f = &input test 
break; 
case 0x101: 
test params[0] = 0x40; 
f = &input test 
break; 
case 0x300: 
test params[0] = 0x04; 
f = &input test 
break; 
case 0x400: 
test params[0] = 0x400; 
f = &pgraph test; 
break; 
default: 
f = NULL; 
break; 
} 
if(f && timer wait(f, param, timeout) != 0) ( 
unk3e8 = 1; 
unk3ec[2] |= 1; 
} 


WAIT BITMASK last 


Shifts val_ret left by 1 position, and waits until the AND operation of the register pointed in reg_last and the first 
parameter equals val_last. Sets flag_eq and the LSB of val_ret on success. The first parameter encodes the bitmask to 


test. The second parameter contains the timeout. 
Todo: Why isn’t flag_eq unset on failure? 
Opcode: 0x15 

Parameters: 2 


Operation: 


b32 seq cb wait(b32 parm) { 
return (mmrd(last_reg) 


& parm) 


last_val; 


(continues on next page) 
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(continued from previous page) 


val ret »- 2; 
if (call timer wait(seq cb wait, param[0], param[1])) 
break; 


ІНО DISABLE 


Disable IRQs, increment reference counter irqglock lvl 
Opcode: OxIf 
Parameters: 1 


Operation: 


interrupt enable 0 - interrupt enable 1 - false; 
іга1оск 1у1++; 


ЇНО ENABLE 


Decrement reference counter irqlock 1х1, enable IRQs if 0. 
Opcode: OxIf 
Parameters: | 


Operation: 


if(!irqlock lvl--) 
interrupt enable 0 = interrupt enable 1 = true; 


FB PAUSE/RESUME 


If parameter 1, disable IRQs on PDAEMON and pause framebuffer (memory), otherwise resume FB and enable IRQs. 
Opcode: 0x20 
Parameters: 1 


Operation: 


if (param[0]) 4 
IRQ DISABLE; 


/ж XXX What does this bit do? х/ 
mmwrs(0x1610, (mmrd(0x1610) & ~3) | 2); 
mmrd(0x1610); 


mmwrs(0x1314, (mmrd(0x1314) & ~0x10001) | 0x10001); 


(continues on next page) 
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(continued from previous page) 


/* RNN:PDAEMON.INPUTO STATUS.FB PAUSI 
while (!(RD(0x7c4) & 4)); 


mmwr seq - 
) else { 
mmwrs (0x1314, 


émmwr_unlocked; 
mmrd(0x1314) 
while 


(RD(0x7c4) & 4); 


mmwrs (0х1610, 
IRQ_ENABLE; 


mmrd(0x1610) 6 ~0x33); 


mmwr_seq = &mmwrs; 


ED х/ 


& ~0x10001); 


3.1.2 PMU microcode commands 


Contents 


* PMU microcode commands 
— Introduction 
* Sample Implementation 


— Commands 


— Command Status 


— Error Codes 


Introduction 


Todo: write me 


Sample Implementation 


Example of setting up, running and handling potential error or timeout states. 


Pseudocode: 

// Define interface to Falcon 
#define PDAEMON SCRATCHO 0х10а040 
#define PDAEMON_SCRATCH1 0х10а044 
// Preparatory step 

#define PUNITS О0МКО08 0x022408 


temp = пукш rd32(PUNITS UNK008); 


nvkm wr32((PUNITS UNK008, temp | 0x2); 


(continues on next page) 
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(continued from previous page) 


// Prepare and send PMU microcode command 


command, id 
command status 


МУ UCODE CMD. COMMAND : 
NV UCODE CMD. 5Т5 NEW; 


EID; 


command packet command id & OxFFFFFFF | 


nvkm wr32(PDAEMON SCRATCHO, 


command, packet); 


// Loop whilst awaiting response 
for (1 = 0; 1 < 50000; ++1) { 
pmu command response 


// 0x02 
// 0х01 


command status; 


nvkm rd32(PDAEMON SCRATCHO); 


pmu command response status = рти command response & 0хҒ0000000; 
if (pmu command response status -- 0x30000000) // NV. UCODE CMD STS COMPLETE 
break; 
if ( (рти command response status !- 0х20000000) && // NV UCODE CMD STS PENDING 
(pmu command response status !- 0x10000000) ) // NV UCODE CMD STS NEW 
{ 
RESPONSE ОМК1 = 1; 
break; 
} 
if (i == 50000-1) 
RESPONSE UNK2 = 1; 
} 
if (RESPONSE UNK1 || RESPONSE UNK2) { 
// Handle timeouts 
} 
else { 


pmu error code nvkm_rd32 (PDAEMON_SCRATCH1) ; 

if (pmu error code & Ox7FFFFFFF) { 
// Handle error code 

} 

if ( (pmu error code & 0x80000000) 
// getlog спа do() 


// Handle PMU command responses 


0x80000000) { 


Commands 


XXX: Gaps expected. Based upon PMU microcode shipped with 390.67 
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Opcode | Name Description 

0x00 NV UCODE CMD COMMAND NONE 

0х01 

0x02 NV UCODE CMD COMMAND EID EEPROM ID 

0x03 NV UCODE CMD COMMAND ESI Structure Init 

0x04 NV UCODE CMD COMMAND ERD Read EEPROM 

0x05 NV UCODE CMD COMMAND EWR Write EEPROM 

0x06 NV UCODE CMD COMMAND ESE Erase Sector 

0x07 NV UCODE CMD COMMAND ECE Erase Chip 

0x08 NV UCODE CMD COMMAND RRD Read priv register using PMU microcode 
0x09 NV UCODE CMD COMMAND RWR Write priv register using PMU microcode 
0x0a NV UCODE CMD COMMAND PREP 

OxOb NV UCODE CMD COMMAND CLOSE 

0хОс NV UCODE CMD COMMAND ЕРКОТ Set Software Protection 

Охба NV UCODE CMD COMMAND ERDSR Read Status Register 

Охбе NV UCODE CMD COMMAND УУ Verify VBIOS 

OxOf NV UCODE CMD COMMAND ECID 

0х10 NV UCODE CMD COMMAND LICVERIFY 

0x11 NV UCODE CMD COMMAND BSI INFO 

0х12 NV UCODE CMD COMMAND HULKPROC 

0х13 NV UCODE СМО. COMMAND АКВ 

0х14 NV UCODE CMD COMMAND ОХК(14 Related to license file generation 
Ox15 

0x16 

0x17 NV UCODE CMD COMMAND OTP READ 

0x18 NV UCODE CMD COMMAND OTP READLOCK 


Command Status 


Opcode | Name Description 
0x00 NV UCODE CMD STS NONE 

0x01 NV UCODE CMD STS NEW 

0x02 NV UCODE CMD STS PENDING 

0x03 NV UCODE CMD STS COMPLETE 


Error Codes 


XXX: Gaps expected. Based upon PMU microcode shipped with 390.67 


Opcode | Name Description 

0x00 NV UCODE ERR, CODE CMD NOERROR No error 

0x01 NV_UCODE_ERR_CODE_CMD_TIMEOUT Timeout occurred 

0x02 NV_UCODE_ERR_CODE_CMD_DEPENDENCY May need other c 

0x03 NV UCODE ERR, CODE CMD EID RD ERROR EEPROM ID ргос 

0x04 NV UCODE ERR, CODE CMD ERD BUF WR ERROR Cannot write mor 

0x05 NV UCODE ERR, CODE CMD ЕУК ВСЕ RD ERROR Cannot read more 

0x06 NV UCODE ЕЕЕ CODE СМ UNSUPPORTED СРО 

0x07 МУ. UCODE ERR, CODE CMD UNSUPPORTED COMMAND Invalid command 
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Table 2 — continued from previous page 


Opcode 


Name 


Description 


0x08 


NV UCODE 


ERR 


CODE 


CMD UNSUPPORTED PARAMETER 


Supplied paramet 


0x09 


NV UCODE 


ERR 


CODE 


CMD SECURE REV LOCK VIOLATION 


0x0a 


NV UCODE 


ERR 


CODE 


LOAD VBIOS VERIFY UCODE FAIL 


OxOb 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY DEBUG FUSE BOARD 


OxOc 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY DEVID FAIL 


0х04 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY CERT NOT FOUND 


Ox0e 


NV_UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY_CERT_PARSE_FAIL 


OxOf 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY CERT VERIFY FAIL 


Ox10 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY HAT FAIL 


Ox11 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY BIOS SIG FAIL 


0х12 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY HULK INIT FAIL 


Ox13 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY HULK KA NOT FOUND 


Ox14 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY HULK TYPE INVALID 


Ox15 


NV UCODE 


ERR 


CODE 


CMD 


VBIOS 


VERIFY HULK SIG INVALID 


0х16 


NV UCODE 


ERR 


CODE 


CERT 


UNKNOWN ERROR 


Ox17 


NV UCODE 


ERR 


CODE 


CERT 


EXT NOT FOUND 


0х 18 


NV UCODE 


ERR 


CODE 


CERT 


SIGNATURE NOT FOUND 


0х19 


NV UCODE 


ERR 


CODE 


CERT 


RSAIK SIGNATURE INVALID 


Oxla 


NV_UCODE 


ERR 


CODE 


CERT 


EXT_NO_SUB_STRUCT_FOUND 


0х 10 


NV UCODE 


ERR 


CODE 


CERT 


UNSUPPORTED VERSION 


0х1с 


NV UCODE 


ERR 


CODE 


CERT 


NO EXTENSION EXIST 


0х 14 


NV UCODE 


ERR 


CODE 


CERT 


ТТОУІ PAYLOAD SIZE ERROR 


Oxle 


NV UCODE 


ERR 


CODE 


CERT 


T7 SW FEATURE PAYLOAD SIZE ERROR 


Ox1f 


NV UCODE 


ERR 


CODE 


CERT 


T7 UNSUPPORTED HW STRUCT VERSION 


0x20 


NV_UCODE 


ERR 


CODE 


CERT 


T7_EXTENSIONS_NUM_EXCEED_LIMIT 


0x21 


NV UCODE 


ERR 


CODE 


CERT 


UGPU PERSONALITY MIS MATCH 


0x22 


NV UCODE 


ERR 


CODE 


CERT 


UNKNOWN HULK FEATURE 


0x23 


NV UCODE 


ERR 


CODE 


CERT 


HULK ECID MISMATCH 


0x24 


NV_UCODE 


ERR 


CODE 


CERT 


HULK_ECID_ENCODING_UNKNOWN 


0x25 


NV UCODE 


ERR 


CODE 


ECID ENCODING ALGO UNKNOWN 


0x26 


NV UCODE 


ERR 


CODE 


CERT T7 КЕС OVERRIDE TYPE UNKNOWN 


0x27 


NV UCODE ERR | 


CODE . 


LICVERIFY UNSUPPORTED LIC TYPE 


0x28 


NV UCODE ERR CODE UNSUPPORTED CONFIG 


0x29 


NV UCODE ERR CODE BSI INFO BRSS INVALID 


Ох2а 


NV UCODE ERR CODE ІМЕМ TO DMEM COPY INVALID PARA 


Ox2b 


NV UCODE ERR CODE DERIVED KEY TYPE INVALID 


Ox2c 


NV UCODE ERR CODE UCODE NOT IN HS MODE 


Ox2d 


NV UCODE 


ERR 


CODE 


VBIOS DEVINIT OFFSETS INVALID 


Ox2e 


NV_UCODE 


ERR 


CODE 


VBIOS_DEVINIT_SIG_INVALID 


Ox2f 


NV_UCODE 


ERR 


CODE 


CERT_HULK_DEVID_MISMATCH 


0x30 


NV_UCODE 


ERR 


CODE 


CERT_HULK_NO_ID_MATCH_FOUND 


0x31 


NV_UCODE, 


ERR 


CODE 


CERT_HULK_DATA_BUFFER_TOO_SMALL 


0x32 


NV_UCODE, 


ERR 


CODE 


CERT_HULK_INFOROM_NOT_FOUND 


0x33 


NV UCODE 


ERR 


CODE 


CERT HULK INFOROM UL GLOB NOT FOUND 


0x34 


NV_UCODE 


ERR 


CODE 


CERT HULK INFOROM HLK OBJ NOT VALID 


0x35 


NV UCODE 


ERR 


CODE 


CERT UGPU LICENSE PROCESSING FAILED 


0x36 


NV UCODE 


ERR 


CODE 


UGPU PROCESSING FAILED INVALID ULF OBJECT 


0x37 


NV UCODE 


ERR 


CODE 


UGPU PROCESSING FAILED INVALID UPR OBJECT 


0x38 


NV UCODE . 


ERR 


. CODE 


CERT20 INTBLK VDPA HEADER INVALID 
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Table 2 — continued from previous page 


Opcode 


Name 


Description 


0x39 


NV UCODE ERR CODE CERT20 INTBLK INT SIG HEADER INVALID 


ОхЗа 


NV UCODE ERR CODE СЕКТ20 INTBLK INT SIG CRYPTO UNDEFINED 


Ox3b 


NV UCODE ERR CODE CERT20 VDPA UNEXPECTED MAJOR TYPE 


Ox3c 


NV UCODE ERR CODE CERT20 VDPA UNEXPECTED MINOR TYPE 


Ox3d 


NV UCODE ERR CODE CERT20 VDPA ENTRY SIZE LARGER THAN DATA BUFFER 


Ox3e 


NV UCODE ERR CODE CERT20 VDPA UNEXPECTED CODE TYPE 


Ox3f 


NV UCODE ERR CODE CERT20 VDPA NOT FINALIZED 


Ox40 


NV UCODE ERR CODE CERT20 VDPA SIG INVALID 


Ox41 


NV UCODE ERR CODE CERT20 VDPA ENTRY NOT FOUND 


0x42 


NV UCODE ERR CODE СЕКТ20 УРРА CERT ІМТБІК MISMATCH 


0х43 


NV UCODE ERR CODE СЕКТ20 VDPA ENTRY FOUND DATA MISMATCH 


0х44 


NV UCODE ERR CODE СЕКТ20 VDPA DATA INVALID 


Ox45 


NV UCODE ERR CODE CERT20 VDPA FLASH SIZE LARGER THAN EXPECTED 


0х46 


NV UCODE ERR CODE СЕКТ20 VDPA DEVID MISMATCH 


Ox47 


NV UCODE ERR CODE GPU INITIALIZATION TABLES SIG CHECK FAILED 


Also known as N' 


0х48 


NV UCODE ERR CODE GPU INITIALIZATION SCRIPTS SIG. CHECK FAILED 


Also known as N' 


Ox49 


0х4а 


NV UCODE ERR CODE VERIFY ENG HULK LICENSE NOT PRESENT 


Ox4b 


NV UCODE ERR CODE VERIFY ENG HULK LICENSE KA NOT FOUND 


Ox4c 


NV UCODE ERR CODE VERIFY ENG HULK LICENSE TYPE INVALID 


Ox4d 


NV UCODE ERR CODE VERIFY ENG HULK 3AES SIG. MISMATCH WITH GPU FUSE 


Ox4e 


NV UCODE ERR CODE VERIFY ENG HULK NO 3AES SIG 


Ox4f 


NV UCODE ERR CODE VERIFY ENG HULK LICENSE HULK AES SIG INVALID 


0x50 


NV UCODE ERR CODE VERIFY ENG HULK LICENSE NVF ENG AES SIG INVALID 


Ox51 


NV UCODE ERR CODE CHECK ERASE LICENSE ERASE DISALLOWED 


0x52 


NV UCODE ERR CODE CMD PREP LICENSE SIZE OVERFLOW 


0x53 


NV UCODE ERR CODE CMD EWR NO ERASE NOT PERMITTED 


0x54 


NV UCODE ERR CODE CMD EWR NO VERIFY NOT PERMITTED 


0x55 


NV UCODE ERR CODE CMD ESE NOT PERMITTED 


0x56 


NV UCODE ERR CODE CMD ECE NOT PERMITTED 


0x57 


NV UCODE ERR CODE CERT20 VDPA UNEXPECTED INSTANCE 


0x58 


NV UCODE ERR CODE DEVID MATCH LIST MORE DEVIDS THAN BUFFERS 


0x59 


NV UCODE ERR CODE DEVID MATCH LIST SIG INVALID 


Ox5a 


NV UCODE ERR CODE DEVID MATCH LIST DEVID MATCH FAILED 


Ox5b 


NV UCODE ERR CODE DEVID MATCH LIST DEVID NOT FOR THE GPU 


Ox5c 


NV UCODE ERR CODE DEVID MATCH LIST DEVID OUT OF HAT COVERAGE 


Ox5d 


NV UCODE ERR CODE PUSH POLL DMEM COPY BUFFER OVERFLOW 


Ox5e 


NV UCODE ERR CODE PUSH POLL DMEM COPY DATA OUT OF RANGE 


Ox5f 


NV UCODE ERR CODE CERT20 INTBLK VDPA BLOCK OVERSIZE 


0x60 


0x61 


0x62 


0x63 


0x64 


0x65 


0x66 


0x67 


0x68 


0x69 
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Table 2 — continued from previous page 


Opcode 


Name 


Description 


Ox6a 


Ox6b 


Ox6c 


NV UCODE ERR CODE СМ EWR ОК TO FLASH CHECK FAILED 


Ox6e 


NV UCODE ERR CODE HW SPI TIMEOUT 


Ox6f 


0x70 


0x71 


0x72 


0x73 


NV_UCODE_ERR_CODE_CERT21_FMT_HAT_ENTRY_NUMBER_INVALID 


0x74 


NV UCODE ERR CODE CERT21 ЕМТ HAT ENTRY FOMMATTER TOO LONG 


0х75 


NV UCODE ERR CODE СЕКТ21 ЕМТ FORMATTER DATA BLOCK OVER SIZE 


0x76 


NV UCODE ERR CODE СЕКТ21 FMT UNEXPECTED FORMATTER TYPE 


0х77 


NV UCODE ERR CODE СЕКТ21 FMT EXCEED FORMATTER LENGTH 


0x78 


NV UCODE ERR CODE EEPROM OTP DEVICE UNSUPPORTED 


0x79 


NV_UCODE_ERR_CODE_EEPROM_OTP_ERASE_NOT_PRESENT 


Ox7a 


NV UCODE ERR CODE EEPROM OTP FACTORY LOCK NOT PRESENT 


Ox7b 


NV UCODE ERR CODE EEPROM OTP FACTORY REGION NOT PRESENT 


Ox7c 


NV UCODE ERR CODE EEPROM OTP USER ADDRESS OUT OF RANGE 


Ox7d 


NV UCODE ERR CODE ЕЕРКОМ OTP FACTORY ADDRESS OUT OF RANGE 


Ox7f 


0x80 


0х81 


0х82 


NV UCODE ERR CODE PLAY READY PDUB SIG INVALID 


0x83 


NV UCODE ERR CODE PLAY READY PDUB ENTRY NOT FOUND 


0x84 


NV_UCODE_ERR_CODE_PLAY_READY_EXIT_FOR_DEVINIT_NOT_RUN 


0x85 


NV_UCODE_ERR_CODE_PLAY_READY_PDUB_PRIV_CONN_STATE_MISMATCH 


0x86 


NV_UCODE_ERR_CODE_PLAY_READY_OTP_ENTRY_NOT_AVAILABLE 


0x87 


NV_UCODE_ERR_CODE_PLAY_READY_SEC2_MUTEX_ACQUIRE_FAILED 


0x88 


NV_UCODE_ERR_CODE_PLAY_READY_SEC2_MUTEX_RELEASE_FAILED 


0x89 


NV_UCODE_ERR_CODE_VERIFY_ENG_LICENSE_INCORRECT_TYPE 


Ox8a 


NV UCODE ERR CODE INVALID FALCON 


Ox8b 


NV UCODE ERR CODE NUM REPAIR ENTRIES EXCEEDS MAX ALLOWED 


0х8с 


NV UCODE ERR CODE INVALID REPAIR OBJECT 


Ox8d 


NV UCODE ERR CODE BCRT2x CERT ВСЕЕЕК OVERFLOW 


Ox8e 


NV UCODE ERR CODE ВСКТ2х HAT ENTRIES ВСЕЕЕК OVERFLOW 


Ox8f 


NV UCODE ERR CODE BCRT2x HAT HEADER OVER SIZE 


0x90 


NV UCODE ERR CODE BCRT2x RSA SIG HEADER OVER SIZE 


0x91 


0x92 


NV_UCODE_ERR_CODE_BCRT2X_CERT_BLOCK_VERSION_UNEXPECTED 


0x93 


NV_UCODE_ERR_CODE_BCRT2X_CERT_CONTROL_HEADER_OVERFLOW 


0x94 


NV_UCODE_ERR_CODE_BCRT2X_MAX_SECURITYZONE_REACHED 


0x95 


NV_UCODE_ERR_CODE_BCRT2X_SECURITYZONE_SIGNATURES_SIZE_CHECK_FAILED 


0x96 


NV_UCODE_ERR_CODE_BCRT2X_SECURITYZONE_SIG_STRUCT_SIZE_CHECK_FAILED 


0x97 


NV UCODE ERR CODE BCRT2X SECURITYZONE SIG ZONE NUM INVALID 


0x98 


NV UCODE ERR CODE ВСКТ2Х SECURITYZONE SIG ALGO INVALID 


0x99 


NV UCODE ERR CODE BCRT2X SECURITYZONE BUILT IN SEC ZONE MISSING 


Ох9а 


NV UCODE ERR CODE ВСКТ2Х SECURITYZONE SIGNATURE INVALID 


Ox9b 


NV UCODE ERR CODE BCRT2X SECURITYZONE SIG NOT FOUND 


Ox9c 


NV UCODE ERR CODE BCRT2X VDPA ENTRY VERIFY HASH MISMATCH 
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Table 2 — continued from previous page 


Opcode | Name Description 


0х94 NV UCODE ERR CODE ВСКТ2Х VDPA INTBLK ENTRIES NUM EXCEED MAX 
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СНАРТЕВ 4 


envydis and envyas documentation 


Contents 


* envydis and envyas documentation 
— Using envydis and envyas 
ж Input format 
ж Input subranging 
* Variant selection 


* Output 


* Output format 


4.1 Using envydis and envyas 


envydis reads from standard input and prints the disassembly to standard output. By default, input is parsed as 
sequence space- or comma-separated hexadecimal numbers representing the bytes to disassemble. 


envyas reads assembly from standard input and outputs to the filename specified by -o «filename». 


The options are: 


4.1.1 Input format 
-w 
(envydis only) Instead of sequence of hexadecimal bytes, treat input as sequence of hexadecimal 32-bit words 


-W 
(envydis only) Instead of sequence of hexadecimal bytes, treat input as sequence of hexadecimal 64-bit words 
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-i 
(envydis only) Treat input as pure binary 


4.1.2 Input subranging 
-b «base» 
(envydis only) Assume the start of input to be at address «base» in code segment 


-d «discard» 
(envydis only) Discard that many bytes of input before starting to read code 


-1 «limit» 
(envydis only) Don’t disassemble more than «limit» bytes 


4.1.3 Variant selection 
-m «machine» 
Select the ISA to disassemble/assemble. One of: 
• Гхжхх| g80: tesla CUDA/shader ISA 
“(ххх | gf100: fermi CUDA/shader ISA 
gk110: kepler GK110 CUDA/shader ISA 
e [ххх | gml07: maxwell CUDA/shader ISA 


. жж 


mm 


ы жж 


ca 


сіх: пу40 and g80 PGRAPH context-switching microcode 

e (ххх | falcon: falcon microcode, used to power various engines on G98+ cards 
• [****] hwsq: PBUS hardware sequencer microcode 

e [***x*] xtensa: xtensa variant as used by video processor 2 [g84-gen] 

e [*** | vuc: video processor 2/3 master/mocomp microcode 

e (|ххжх | macro: 61100 PGRAPH macro method ISA 


ы жж 


іші 


vpl: video processor 1 [nv41-gen] code 


• [**x«] vcomp: PVCOMP video compositor microcode 
Where the quality level is: 

. 1: Bare beginnings 

• [х 1: Knows a few instructions 

e [хх 1: Knows enough instructions to write some simple code 


e [ххх]: Knows most instructions, enough to write advanced code 


e [***x]: Knows all instructions, or very close to. 


-V «variant» 
Select variant of the ISA. 


For g80: 
* g80: The original G80 [aka compute capability 1.0] 
* g84: G84, G86, G92, G94, G96, G98 [aka compute capability 1.1] 
e g200: G200 [aka compute capability 1.3] 
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* mcp77: MCP77, MCP79 [aka compute capability 1.2] 
* gt215: GT215, GT216, GT218, MCP89 [aka compute capability 1.2 + d3d10.1] 
For gf100: 
* gf100: GF100:GK104 cards 
* gk104: GK104+ cards 
For ctx: 
* пу40: NV40:G80 cards 
e 080: G80:G200 cards 
e 0200: G200:GF100 cards 
For hwsq: 
e пу17: NV17:NV41 cards 
* пу41: МУ41:680 cards 
* g80: G80:GF100 cards 
For falcon: 
• fucO: falcon version 0 [G98, MCP77, MCP79] 
* fuc3: falcon version 3 [GT215 and up] 
* fuc4: falcon version 4 [GF119 and up, selected engines only] 
* fuc5: falcon version 5 [GK208 and up, selected engines only] 
* fuc6: falcon version 6 [GP102 and up, selected engines only] 
For vuc: 
* vp2: VP2 video processor [G84:G98, G200] 
* vp3: VP3 video processor [G98, MCP77, MCP79] 
* vp4: VP4 video processor [GT215:GF119] 


-F «feature» 
Enable optional ISA feature. Most of these are auto-selected by —V, but can also be specified manually. Can be 
used multiple times to enable several features. 


For g80: 
e 5111: SMI.1 new opcodes [selected by g84, g200, mcp77, gt215] 
e 5112: 5М1.2 new opcodes [selected by g200, mcp77, gt215] 
* fp64: 64-bit floating point [selected by g200] 
* 43410 1: Direct3D 10.1 new features [selected by gt215] 
For gf100: 
e gfl00op: GF100:GK104 exclusive opcodes [selected by gf100] 
e gk104op: GK104+ exclusive opcodes [selected by gk104] 
For ctx: 
* nv40op: NV40:G80 exclusive opcodes [selected by nv40] 
e g80op: G80:GF100 exclusive opcodes [selected by g80, 2200] 


4.1. Using envydis and envyas 545 


nVidia Hardware Documentation, Release git 


e callret: call/ret opcodes [selected by 0200] 
For hwsq: 
e nvl7f: NV17:G80 flags [selected by nv17, nv41] 
e пу41Ғ: NV41:G80 flags [selected by nv41] 
* nv4lop: NV41 new opcodes [selected by nv41, g80] 
For falcon: 
e fucOop: falcon version 0 exclusive opcodes [selected by fuc] 
* fuc3op: falcon version 3+ exclusive opcodes [selected by fuc3, fuc4, fuc5, fuc6] 
e fuc4op: falcon version 4+ exclusive opcodes [selected by fuc4, fuc5, fuc6] 
» fucSop: falcon version 5+ exclusive opcodes [selected by fuc5, fuc6] 
e fuc6op: falcon version 6+ exclusive opcodes [selected by fuc6] 
* pc24: 24-bit PC opcodes [selected by fuc4] 
* crypt: Cryptographic coprocessor opcodes [has to be manually selected] 
For vuc: 
* vp2op: VP2 exclusive opcodes [selected by vp2] 
* vp3op: VP3+ exclusive opcodes [selected by vp3, vp4] 
* vp4op: VP4 exclusive opcodes [selected by vp4] 


-O <mode> 
Select processor mode. 


For g80: 
* vp: Vertex program 
* gp: Geometry program 
* fp: Fragment program 
* cp: Compute program 


-S <stride> 
Override stride length for ISA and variant (relevant in binary mode only). 


-M <mapfile> 
(envydis only) Load map file. 


-u «value» 
(envydis only) Set map file label value. 


4.1.4 Output 


-o «filename» 
(envyas only) Output to filename 
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4.1.5 Output format 


-n 


-q 


-a 


-W 


-W 


-1 


(envydis only) Disable output coloring 


(envydis only) Disable printing address + opcodes. 


(envyas only) Decorate output with human-readable section names and labels 


(envyas only) Output as a sequence of hexadecimal 32-bit words instead of bytes 


(envyas only) Output as a sequence of hexadecimal 64-bit words instead of bytes 


(envyas only) Output as pure binary 


4.1. Using envydis and envyas 
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СНАРТЕВ 5 


TODO list 


Todo: map out the BAR fully 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 88.) 


Todo: RE it. or not. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 133.) 


Todo: It's present on some МУ4х 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 144.) 


Todo: figure out size 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 184.) 


Todo: figure out NV3 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 185.) 
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Todo: verify G80 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 186.) 


Todo: MSI 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/bars.rst, 
line 203.) 


Todo: are EVENTS variants right? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 54.) 


Todo: cleanup, crossref 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 56.) 


Todo: 8,9, 13 seem used by microcode! 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 278.) 


Todo: check variants for 15f4, 15fc 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 279.) 


Todo: check variants for 4-7, some МУ4х could have it 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 280.) 


Todo: check variants for 14, 15 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 281.) 


Todo: doc 1084 bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/hwsq.rst, 
line 282.) 
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Todo: connect 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 49.) 


Todo: loads and loads of unknown registers not shown 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 74.) 


Todo: document other known stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 96.) 


Todo: cleanup 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 104.) 


Todo: description, maybe move somewhere else 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 183.) 


Todo: verify that it's host cycles 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pbus.rst, 
line 192.) 


Todo: nuke this file and write a better one - it sucks. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 23.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 39.) 


Todo: wrong on NV3] 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 56.) 


Todo: this register and possibly some others doesn't get written when poked through actual PCI config accesses - 
PBUS writes work fine 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 57.) 


Todo: NV40 has something at 0x98 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 66.) 


Todo: MCP77, МСР79, MCP89 stolen memory regs at Oxf4+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 67.) 


Todo: very incomplete 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 74.) 


Todo: is that all? 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 97.) 


Todo: find it 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pci.rst, 
line 103.) 


Todo: more info 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pfuse.rst, 
line 18.) 


Todo: fill me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pfuse.rst, 
line 31.) 


Todo: unk bitfields 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 99.) 


Todo: what is this? when was it introduced? seen поп-0 on at least G92 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 108.) 


Todo: there are cards where the steppings don't match between registers - does this mean something or is it just a 
random screwup? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 119.) 


Todo: figure out the CS thing, figure out the variants. Known not to exist on NV40, NV43, NV44, C51, G71; known 
to exist on MCP73 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 205.) 


Todo: unknowns 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 240.) 
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Todo: RE these three 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 302.) 


Todo: change all this duplication to indexing 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 326.) 


Todo: check 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 412.) 


Todo: figure out unknown interrupts. They could've been introduced much earlier, but we only know them from 
bitscanning the INTR. MASK regs. on GT215+. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 468.) 


Todo: unknowns 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 501.) 


Todo: document these two 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 503.) 


Todo: verify variants for these? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pmc.rst, 
line 513.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pring.rst, 
line 9.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pring.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pring.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pring.rst, 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/pring.rst, 
line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 44.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 48.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 52.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 56.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/prma.rst, 
line 60.) 


Todo: document that some day]. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/ptimer.rst, 
line 29.) 


Todo: figure these out 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/ptimer.rst, 
line 215.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/ptimer.rst, 
line 235.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/bus/ptimer.rst, 
line 239.) 


Todo: document MMIO_FAULT_* 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/bus/ptimer.rst, 
line 241.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/bus/punits.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/bus/punits.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/bus/punits.rst, 
line 23.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pco: 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pco: 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pco: 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 


line 43.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 15.) 


Todo: more interrupts? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 88.) 


Todo: interrupt refs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 89.) 


Todo: MEMIF interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 90.) 


Todo: determine core clock 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 91.) 


Todo: refs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pdis 
daemon.rst, line 121.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pkfi 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pkfi 
line 15.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pkfi 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 15.) 


Todo: MEMIF interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 77.) 


Todo: determine core clock 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 78.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 88.) 


Todo: figure out unknowns 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/pun 


line 104.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/vga. 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/vga. 


line 15.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/vga. 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/g80/vga. 


line 27.) 


Todo: regs Ox1c-Oxff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 90.) 


Todo: regs Ox1xx and Ox5xx 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 91.) 


Todo: regs OxfOxx 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 92.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 137.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 147.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 153.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 169.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 173.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 179.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 185.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 187.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 296.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/рда‹ 


line 310.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 319.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 344.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 


line 346.) 


Todo: some newer DACs have more functionality? 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/раа‹ 
line 376.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 42.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 61.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 65.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 69.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 73.) 


Todo: unknowns 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 
line 123.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 127.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 133.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 137.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 141.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 145.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 149.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 153.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 157.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 161.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 165.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 169.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/pfb. 


line 173.) 


Todo: figure out what the fuck this engine does 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 30.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 34.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 67.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 117.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 127.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 131.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 149.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 157.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 161.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 165.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv 1/prm 


line 169.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 15.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 25.) 


Todo: complete me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 32.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pcrt 


line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/prar 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/prar 


line 15.) 


Todo: complete me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/prar 


line 26.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/prar 


line 30.) 


Todo: complete me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/prar 


line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/ptv. 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/ptv. 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/ptv. 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pvic 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pvic 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pvic 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/pvic 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/vga. 


line 9.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/vga. 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/vga. 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/display/nv3/vga. 
line 27.) 


Todo: document ljmp/lcall 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/branch.rst 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 39.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/crypt.rst, 
line 55.) 


Todo: document UAS 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/data.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/debug.rst, 
line 7.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/debug.rst, 
line 17.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/fifo.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/fifo.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/fifo.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/fifo.rst, 
line 32.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/fifo.rst, 
line 41.) 


Todo: figure out interrupt 5 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intr.rst, 
line 33.) 


Todo: check edge/level distinction on УО 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intr.rst, 
line 85.) 


Todo: didn't ieX -> isX happen before v4? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intr.rst, 
line 196.) 


Todo: figure out remaining circuitry 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intro.rst, 
line 35.) 


Todo: figure out v4 new stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intro.rst, 
line 48.) 


Todo: figure out v4.1 new stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intro.rst, 
line 49.) 


Todo: figure out v5 new stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/intro.rst, 
line 50.) 


Todo: document v4 new addressing 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/io.rst, 
line 42.) 


Todo: list incomplete for v4 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/io.rst, 
line 159.) 


Todo: clean. fix. write. move. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/io.rst, 
line 196.) 


Todo: subope 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/io.rst, 
line 318.) 


Todo: figure out у4+ stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/isa.rst, 
line 67.) 


Todo: long call/branch 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/isa.rst, 
line 131.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 


line 32.) 


Todo: write me 


571 


nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 
line 40.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/memif.rst 
line 48.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/perf.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/perf.rst, 
line 15.) 


Todo: docs & RE, please 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/perf.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/perf.rst, 
line 58.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/proc.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/proc.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/proc.rst, 
line 121.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/proc.rst, 
line 129.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/proc.rst, 
line 137.) 


Todo: check interaction of secret / usable flags and entering/exitting auth mode 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/vm.rst, 
line 24.) 


Todo: one more unknown flag on secret engines 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/xfer.rst, 
line 33.) 


Todo: figure out bit 1. Related to 0х10с2 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/xfer.rst, 
line 189.) 


Todo: how to wait for xfer finish using only IO? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/xfer.rst, 
line 194.) 


Todo: bits 4-5 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/xfer.rst, 
line 210.) 


Todo: RE and document this stuff, find if there's status for code xfers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/falcon/xfer.rst, 
line 212.) 


Todo: check for NV4-style mode on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 27.) 


Todo: verify those 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 107.) 


Todo: determine what happens on GF100 on all imaginable error conditions 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 109.) 


Todo: check channel numbers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 171.) 


Todo: What about GF100? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 195.) 


Todo: check the ib size range 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 231.) 


Todo: figure out bit 8 some day 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 257.) 


Todo: do an exhaustive scan of commands 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 302.) 


Todo: didn’t mthd О work even if sli_active=0? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 340.) 


Todo: check pusher reaction on ACQUIRE submission: pause? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 347.) 


Todo: check bitfield bounduaries 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 425.) 


Todo: check the extra SLI bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 427.) 


Todo: look for other forms 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/dma- 
pusher.rst, line 429.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 42.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 50.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 56.) 


Todo: document me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 60.) 


Todo: document me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/g80- 
pfifo.rst, line 64.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pfifo.rst, line 43.) 


Todo: write me 
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nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pspoon.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pspoon.rst, line 14.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pspoon.rst, line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/gf100- 
pspoon.rst, line 30.) 


Todo: check if it still holds on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/intro.rst, 
line 32.) 


Todo: check PIO channels support on NV40:G80 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/intro.rst, 
line 129.) 


Todo: look for GF100 PFIFO endian switch 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/intro.rst, 
line 189.) 


Todo: is it still true for GF100, with VRAM-backed channel control area? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/intro.rst, 
line 194.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 126.) 


Todo: document gray code 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 211.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 215.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 219.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 223.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 227.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 231.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 235.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 239.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 243.) 


Todo: write me 
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nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 247.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 253.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 257.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 261.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 267.) 


Todo: document me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 271.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 279.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 283.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 287.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 291.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 299.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 303.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 307.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 311.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 315.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 319.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 323.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 
pfifo.rst, line 327.) 


Todo: write me 
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nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 331.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 337.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 341.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 349.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 357.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 365.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv1- 


pfifo.rst, line 369.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 


pfifo.rst, line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 


pfifo.rst, line 21.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 
pfifo.rst, line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 
pfifo.rst, line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 
pfifo.rst, line 39.) 


Todo: document me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 
pfifo.rst, line 43.) 


Todo: document me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/nv4- 
pfifo.rst, line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pcopy.rst, 
line 114.) 


Todo: describe PCOPY 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pcopy.rst, 
line 116.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 14.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 28.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 34.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/pio.rst, 
line 42.) 


Todo: missing the GF100+ methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 50.) 


Todo: verify this on all card families. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 198.) 


Todo: verify all of the pseudocode... 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 303.) 


Todo: figure this out 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 424.) 


Todo: RE timeouts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 433.) 


Todo: is there ANY way to make G80 reject non-DMA object classes? 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 450.) 


Todo: bit 12 does something on GF100? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 546.) 


Todo: check how this is reported on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/fifo/puller.rst, 
line 620.) 


Todo: what were the GPIOs for? 


(The original entry is located in Celsius, line 1.) 


Todo: verify all sorts of stuff on NV2A 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 181.) 


Todo: figure out NV34 3d engine changes 


(The original entry is located in Rankine, line 1.) 


Todo: more changes 


(The original entry is located in Curie, line 1.) 


Todo: figure out 3d engine changes 


(The original entry is located in Curie, line 1.) 


Todo: all geometry information unverified 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 234.) 


Todo: any information on the RSX? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 236.) 


584 Chapter 5. TODO list 


nVidia Hardware Documentation, Release git 


Todo: geometry information not verified for G94, МСР77 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 277.) 


Todo: figure out PGRAPH/PFIFO changes 


(The original entry is located in Kepler, line 1.) 


Todo: it is said that one of the GPCs [Oth one] has only one TPC on GK106 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 341.) 


Todo: what the fuck is GK110B? and GK208B? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 343.) 


Todo: GK210 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 345.) 


Todo: GK20A 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 347.) 


Todo: GM20x, GP10x 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 349.) 


Todo: another design counter available on GM107, another 4 on GP10x 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 351.) 


Todo: TU117 one of the GPCs has only three ТРС (so 7 in total, not 8) 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/gpu.rst, 
line 353.) 


585 


nVidia Hardware Documentation, Release git 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/blit.rst, 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/blit.rst, 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/blit.rst, 
line 25.) 


Todo: write m 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 11.) 


Todo: check NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 142.) 


Todo: check if still applies on NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 181.) 


Todo: check NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 198.) 


Todo: check NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 211.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 261.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 269.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 277.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 285.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ctxobj. 
line 293.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/dvd.rst 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/dvd.rst 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/dvd.rst 
line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 19.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 37.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 43.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/gdi.rst, 
line 49.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 11.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 35.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 43.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifc.rst, 
line 51.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ifm.rst, 
line 14.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifm.rst. 
line 20.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/ifm.rst, 
line 26.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/2d/ifm.rst. 
line 32.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 183.) 


Todo: figure out this enum 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 195.) 


Todo: figure out this enum 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 203.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 209.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 215.) 


Todo: check 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 225.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 353.) 


Todo: figure out what happens on ITM, IFM, BLIT, TEX*BETA 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 362.) 


Todo: NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 385.) 


Todo: document that and BLIT 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 395.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 401.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 407.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 413.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 419.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 425.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 431.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 437.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 443.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 448.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 456.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/intro.rs 
line 464.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/nv1- 
tex.rst, line 16.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/nv1- 
tex.rst, line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/nv1- 
tex.rst, line 28.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/nv1- 
tex.rst, line 34.) 


Todo: precise upconversion formulas 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/pattern 
line 351.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/sifm.rs 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/sifm.rs 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/sifm.rs 
line 25.) 


Todo: PM TRIGGER? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 65.) 
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Todo: PATCH? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 67.) 


Todo: add the patchcord methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 69.) 


Todo: document common methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 71.) 


Todo: document point methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 92.) 


Todo: document line methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 125.) 


Todo: document tri methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 153.) 


Todo: document rect methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 176.) 


Todo: document solid-related unified 2d object methods 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/2d/solid.rs 
line 182.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/bundles.rs 
line 169.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/bundles.rs 
line 442.) 


Todo: why is POINT SMOOTH ENABLE aliased here? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/bundles.rs 
line 578.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/3d. 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/3d. 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/pg 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/pg 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/pg 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/celsius/pg 
line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/3d.r: 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/3d.r: 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/pgra 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/pgra 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/pgra 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/pgra 
line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/curie/pgra 
line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/3d.r 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/3d.r 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/com 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/com 
line 15.) 


Todo: convert 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/ctxc 
line 5.) 


Todo: rather incomplete. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 43.) 


Todo: and vertex programs 2? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 59.) 


Todo: figure out the exact differences between these & the pipeline configuration business 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 61.) 


Todo: figure out and document the SRs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 161.) 


Todo: figure out the semi-special c16[]/c17[]. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 179.) 


Todo: size granularity? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 193.) 


Todo: other program types? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 195.) 
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Todo: describe the shader input spaces 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 205.) 


Todo: describe me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 212.) 


Todo: not true for GK104. Not complete either. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 231.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 237.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/cud: 
line 243.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/mac 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/mac 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 21.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/fermi/pgr: 
line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 13.) 


Todo: WAIT FOR IDLE and PM TRIGGER 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 15.) 


Todo: check Direct3D version 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 56.) 


Todo: document МУІ NULL 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 74.) 


Todo: figure out wtf is the deal with TEXTURE objects 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 175.) 
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Todo: find better name for these two 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 205.) 


Todo: check МУЗ D3D version 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 228.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 283.) 


Todo: write something here 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 289.) 


Todo: beta factor size 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 344.) 


Todo: user clip state 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 346.) 


Todo: NV1 framebuffer setup 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 348.) 


Todo: NV3 surface setup 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 350.) 


Todo: figure out the extra clip stuff, etc. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 352.) 
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Todo: update for МУ4+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 354.) 


Todo: NV3+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 386.) 


Todo: more stuff? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 405.) 


Todo: verify big endian on non-G80 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 434.) 


Todo: figure out NV20 mysterious warning notifiers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 443.) 


Todo: describe GF100- notifiers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 445.) 


Todo: 0x20 - NV20 warning notifier? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 468.) 


Todo: figure out if this method can be disabled for NV1 compat 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/intro.rst, 
line 576.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/3d. 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/3d. 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/pgi 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/pgi 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/pgi 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kelvin/pgi 
line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kepler/3d. 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kepler/3d. 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kepler/con 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/kepler/con 
line 15.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/m2mf rst, 
line 11.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/m2mf.rst, 
line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/m2mf rst, 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/m2mf rst, 
line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/m2mf.rst, 
line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/max well/2 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/max well/2 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/max well/c 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/max well/c 
line 15.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv 1/dma.1 


line 37.) 


Todo: Lots of speculation here. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 276.) 


Todo: lots of unknown bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 355.) 


Todo: lots of unknown bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 370.) 
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Todo: lots of unknown bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 383.) 


Todo: Figure out what all that stuff does. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 433.) 


Todo: bitfields 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 471.) 


Todo: more bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 492.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 613.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 628.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 632.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 636.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 640.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 644.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 648.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 652.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 656.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 660.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 664.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 668.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 672.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 


line 676.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 680.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 682.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 690.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 696.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 702.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 708.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/pgrap 
line 714.) 


Todo: figure out selecting the right part of SRC COLOR for IFC/IFM/BITMAP 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 
line 71.) 


Todo: BLIT and source pixel discards 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 
line 73.) 
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Todo: pseudocode, please 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 75.) 


Todo: weird shit happens if blending is enabled and framebuffer is 8bpp. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 152.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 453.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 459.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 466.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 470.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 583.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 591.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 595.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 667.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 675.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/rop.rs 


line 679.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 45.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 49.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 53.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 57.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 61.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 69.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 73.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 77.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 81.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 


line 89.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 
line 97.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/nv l/xy.rst 
line 105.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/rankine/3c 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/rankine/3c 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/3d.rst 
line 10.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/3d.rst 
line 16.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pdma 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pdma 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pdma 
line 23.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pdma 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 13.) 


Todo: finish me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 108.) 


Todo: figure out the bits, should be similiar to ће NV1 options 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 125.) 


Todo: check M2MF source 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 132.) 


Todo: check 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 138.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 148.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 


line 154.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/riva/pgrap 
line 160.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/3d.rs 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/3d.rs 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/comy 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/comy 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/crop. 
line 8.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/ctxct 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 19.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 34.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 50.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 58.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 72.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 88.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 103.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 117.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 132.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 149.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 158.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 173.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 189.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 211.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 219.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 55.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 73.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 96.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 116.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 134.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 155.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 185.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 193.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 201.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 220.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 227.) 


615 


nVidia Hardware Documentation, Release git 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 239.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 246.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 253.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 37.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 53.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 69.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 55.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 71.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 87.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 103.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 29.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 81.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 155.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 242.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 285.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 317.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 368.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 415.) 


Todo: check variants for preret/indirect bra 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 45.) 


Todo: wtf is up with $27? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 134.) 
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Todo: a bit more detail? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 166.) 


Todo: perhaps we missed something? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 170.) 


Todo: seems to always be 0x20. Is it really that boring, or does MP switch to a smaller/bigger stride sometimes? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 176.) 


Todo: when no-one's looking, rename the a[], p[]. v[] spaces to something sane. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 247.) 


Todo: discard mask should be somewhere too? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 287.) 


Todo: call limit counter 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 289.) 


Todo: there's some weirdness in barriers. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 306.) 


Todo: you sure of control instructions with поп-0 w1b0-1? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 329.) 


Todo: what about other bits? ignored or must be 0? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 448.) 
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Todo: figure out where and how 547 can be used. Seems to be a decode error more often than not... 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 571.) 


Todo: what address field is used in long control instructions? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 574.) 


Todo: verify the 127 special treatment part and direct addressing 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 647.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 671.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 676.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 24.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 44.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 58.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 80.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 98.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 118.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 43.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 58.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 74.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 88.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 101.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 107.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 115.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 121.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 38.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 52.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 66.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 81.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/cuda. 
line 96.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra 
line 20.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra 
line 28.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra 
line 36.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra 
line 42.) 


Todo: write me 


(The original entry is located іп /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra: 
line 48.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/pgra: 
line 54.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/prop. 
line 8.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/vfetc 
line 8.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tesla/zrop. 
line 8.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/3d.rst, 
line 10.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/3d.rst, 
line 16.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 13.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 35.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 53.) 


Todo: write me 


(The original entry is located іп /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/tnt/pgrapk 
line 59.) 


Todo: intro? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/ctx.rst, 
line 13.) 


Todo: intro? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/ctx.rst, 
line 132.) 


Todo: intro? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/ctx.rst, 
line 223.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/ctx.rst, 
line 296.) 


Todo: NV25, NV30 have RAMs unaccounted for. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/intro.rs: 
line 236.) 
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Todo: Curie still has switchable RAMs unaccounted for. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/intro.rs: 
line 238.) 


Todo: None of the above is certain on Curie. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/intro.rs: 
line 329.) 


Todo: Figure out how this works on Curie. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/intro.rs: 
line 366.) 


Todo: How are things assembled on Curie? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/intro.rs: 
line 466.) 


Todo: NV34 (and presumably all Kelvins and Rankines) have SIPOS, which is a copy of the first IBUF word with 
unknown purpose. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 154.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 258.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 267.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 273.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 279.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 338.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 344.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 350.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 356.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 362.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/isa.rst, 
line 368.) 


Todo: Incomplete list. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/mode.r: 
line 359.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/graph/xf/mode.r: 
line 365.) 


Todo: convert glossary 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/index.rst, 
line 26.) 


Todo: finish file 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/intro.rst, 
line 345.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/g80- 
gpio.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/g80- 
gpio.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/g80- 
gpio.rst, line 23.) 


Todo: figure out what else is stored in the EEPROM, if anything. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/nv1 - 
peeprom.rst, line 31.) 


Todo: figure out how the chip ID is stored in the EEPROM. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/nv1 - 
peeprom.rst, line 32.) 


Todo: figure out wtf the chip ID is used for 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/nv1 - 
peeprom.rst, line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/1o/nv10- 
gpio.rst, line 9.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/io/nv 10- 


gpio.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/io/nv 10- 


gpio.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/io/nv 1 0- 


gpio.rst, line 32.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/1o/nv10- 


gpio.rst, line 40.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/io/nv 1 0- 


gpio.rst, line 44.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/io/nv 1 0- 


gpio.rst, line 48.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pmedia.rst, 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pmedia.rst, 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pmedia.rst, 


line 23.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pmedia.rst, 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pnvio.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pnvio.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pnvio.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pnvio.rst, 
line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 
line 27.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 43.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 53.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 61.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 63.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 71.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/prom.rst, 


line 75.) 


Todo: RE me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pstraps.rst, 
line 304.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pstraps.rst, 
line 310.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pstraps.rst, 
line 316.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/io/pstraps.rst, 
line 322.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
comp.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
comp.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
host-mem.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
host-mem.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
host-mem.rst, line 23.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
host-mem.rst, line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
host-mem.rst, line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
p2p.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
p2p.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
p2p-rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
pfb.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
pfb.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
pfb.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
remap.rst, line 9.) 


Todo: write me 


633 


nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
remap.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
remap.rst, line 23.) 


Todo: vdec stuff 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 28.) 


Todo: GF100 ZCULL? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 29.) 


Todo: check pitch, width, height min/max values. this may depend on binding point. check if 64 byte alignment still 
holds on GF100. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 90.) 


Todo: check bounduaries on them all, check tiling on GF100. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 124.) 


Todo: PCOPY surfaces with weird gob size 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 125.) 


Todo: wtfis up with modes 4 and 5? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 551.) 


Todo: nail down MS8 CS24 sample positions 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 552.) 
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Todo: figure out mode 6 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 553.) 


Todo: figure out MS8 CS24 C component 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 554.) 


Todo: check MS8/128bpp on СЕ100. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 560.) 


Todo: wtf is color format Ox 1d? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 638.) 


Todo: htf do I determine if a surface format counts as 0x07 or 0x08? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 721.) 


Todo: which component types are valid for a given bitfield size? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 809.) 


Todo: clarify float encoding for weird sizes 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 810.) 


Todo: verify I haven't screwed up the ordering here 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 841.) 


Todo: figure out the М58 CS24 formats 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 933.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 939.) 


Todo: figure out more. Check how it works with 2d engine. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 957.) 


Todo: verify somehow. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 981.) 


Todo: reformat 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 1033.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 1136.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
surface.rst, line 1142.) 


Todo: kill this list in favor of an actual explanation 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 45.) 


Todo: PVP1 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 309.) 


Todo: PME 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 310.) 
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Todo: Move to engine doc? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 311.) 


Todo: verify GT215 transition for medium pages 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 518.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 618.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 624.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm.rst, line 630.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vm. rst, line 636.) 


Todo: verify it’s really the G84 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 128.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 187.) 


Todo: tag stuff? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 241.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 247.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 253.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/g80- 
vram.rst, line 259.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
comp.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
comp.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
host-mem.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
host-mem.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf 100- 
host-mem.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
host-mem.rst, line 27.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
host-mem.rst, line 36.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
p2p-rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
p2p.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
p2p.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
vm.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
vm.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
vram.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/gf100- 
vram.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
pdma.rst, line 45.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
surface.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
surface.rst, line 15.) 


Todo: wtf is the password storage thing, and why is it located at an inconvenient and unmovable place? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
vram.rst, line 45.) 


Todo: verify you cannot go between the two buffers by overflowing Y 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
vram.rst, line 90.) 
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Todo: figure out what RAMAU nad UNK2 are for 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv1- 
vram.rst, line 152.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv10- 
pfb.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv10- 
pfb.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv10- 
pfb.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
dmaobj.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
dmaobj.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
pfb.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
pfb.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
pfb.rst, line 23.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
vram.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv3- 
vram.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
dmaobj.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
dmaobj.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
vram.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
vram.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
vram.rst, line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv4- 
vram.rst, line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv40- 
pfb.rst, line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv40- 
pfb.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv40- 
pfb.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
host-mem.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
host-mem.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
host-mem.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
pfb.rst, line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
pfb.rst, line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/nv44- 
pfb.rst, line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/pbfb.rst 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pbfb.rst 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/pbfb.rst 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pbfb.rst 


line 27.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/pbfb.rst 


line 35.) 


Todo: convert 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/peepho 


line 58.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pffb.rst 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pffb.rst 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pffb.rst 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pffb.rst 


line 31.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pmfb.rs 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pmfb.rs 


line 15.) 


Todo: fill me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/memory/pmfb.rs 


line 29.) 


Todo: fill me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pmfb.rs 


line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pmfb.rs 


line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pxbar.r: 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pxbar.r: 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/memory/pxbar.r: 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 13.) 
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Todo: check UNK005000 variants [sorta present on NV35, NV34, C51, MCP73; present on NV5, NV11, NV17, 
МУТА, МУ20; not present on NV44] 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 128.) 


Todo: check PCOUNTER variants 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 129.) 


Todo: some IGP don’t have РУРЕ/РУРІ [C51: present, but without PME; MCP73: not present at all] 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 130.) 


Todo: check PSTRAPS on IGPs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 131.) 


Todo: check PROM on IGPs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 132.) 


Todo: PMEDIA not on IGPs [MCP73 and C51: not present] and some other cards? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 133.) 


Todo: PFB not on IGPs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 134.) 


Todo: merge PCRTC+PRMCIO/PRAMDAC+PRMDIO? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 135.) 


Todo: UNK6E0000 variants 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 136.) 


Todo: UNK006000 variants 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 137.) 


Todo: UNKOOEO00 variants 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 138.) 


Todo: 102000 variants; present on MCP73, not C51 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 139.) 


Todo: 10f000:112000 range on GT215- 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 208.) 


Todo: verified accurate for GK104, check on earlier cards 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 274.) 


Todo: did they finally kill off PMEDIA? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 275.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 307.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 313.) 


Todo: RE me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 319.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 325.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 331.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 335.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 339.) 


Todo: NV4x? NVCx? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 345.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 349.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 353.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 
line 357.) 


Todo: RE me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 361.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 367.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 371.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 375.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 379.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 383.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 387.) 


Todo: RE me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/mmio.rst, 


line 391.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 


paudio.rst, line 13.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 21.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 35.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 47.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/nv1- 
paudio.rst, line 53.) 


Todo: wtf 15 with that Ox21x ID? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pciid.rst, 
line 482.) 


Todo: shouldn't 0x03b8 support x4 too? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pciid.rst, 
line 2212.) 


Todo: convert 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/fermi.r: 
line 9.) 


Todo: crossrefs 
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nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 9.) 


Todo: why? any others excluded? NV25, NV2A, NV30, NV36 pending a check 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 19.) 


Todo: figure out what else happened on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 50.) 


Todo: make it so 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 55.) 


Todo: figure out interupt business 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 96.) 


Todo: wtfis CYCLES. ALT for? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 138.) 


Todo: C51 has no PCOUNTER, but has a7f4/a7f8 registers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 176.) 


Todo: MCP73 also has a7f4/a7f8 but also has normal PCOUNTER 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 178.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 186.) 


Todo: complete me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 205.) 


Todo: PAUSED? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 334.) 


Todo: unk bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 336.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 380.) 


Todo: UNK8 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 459.) 


Todo: check bits 16-20 on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 651.) 


Todo: figure out how single event mode is supposed to be used on GF100+ 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 672.) 


Todo: wtfis CYCLES ALT? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 696.) 


Todo: figure out what's the deal with GF100 counters 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 768.) 


Todo: figure out if there's anything new on GF100 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 829.) 


Todo: unk bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 933.) 


Todo: more bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 946.) 


Todo: GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 948.) 


Todo: threshold on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 1064.) 


Todo: check if still valid on GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 1192.) 


Todo: figure out record mode setup for GF100 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/intro.rs 
line 1274.) 


Todo: convert 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/nv 10.г$ 
line 9.) 


Todo: figure it out 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/nv40.rs 
line 37.) 


Todo: find some, I don't know, signals? 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/nv40.rs 
line 39.) 


Todo: figure out roughly what stuff goes where 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/tesla.rs 
line 44.) 


Todo: find signals. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pcounter/tesla.rs 
line 46.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 9.) 


Todo: figure out IOCLK, ZPLL, DOM6 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 38.) 


Todo: figure out 4010, 4018, 4088 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 39.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 44.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 52.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 
clock.rst, line 56.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 


clock.rst, line 60.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 


clock.rst, line 68.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 


clock.rst, line 76.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 


clock.rst, line 84.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/g80- 


clock.rst, line 92.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 


clock.rst, line 9.) 


Todo: how many RPLLs are there exactly? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 


clock.rst, line 33.) 


Todo: figure out where host clock comes from 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 


clock.rst, line 34.) 


Todo: VM clock is a guess 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 


clock.rst, line 35.) 


Todo: memory clock uses two PLLs, actually 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 36.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 48.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 52.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 60.) 


Todo: write me 


(The original entry is located іп /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 68.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gf100- 
clock.rst, line 76.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 
clock.rst, line 9.) 


Todo: figure out unk clocks 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 
clock.rst, line 32.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 
clock.rst, line 37.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 45.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 49.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 53.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 73.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 81.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/gt215- 


clock.rst, line 89.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 


clock.rst, line 9.) 


Todo: DLL on NV3 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 


clock.rst, line 24.) 


Todo: NV1??? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 


clock.rst, line 25.) 


Todo: write me 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 29.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 37.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 49.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 53.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 61.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv1- 
clock.rst, line 65.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 9.) 


Todo: figure out where host clock comes from 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 26.) 


Todo: figure out 4008/shader clock 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 27.) 


Todo: figure out 4050, 4058 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 28.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 41.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 45.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv40- 
clock.rst, line 53.) 


Todo: figure out what divisors are available 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv43- 
therm.rst, line 57.) 


Todo: figure out what divisors are available 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv43- 
therm.rst, line 95.) 


Todo: Make sure this clock range is safe on all cards 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv43- 
therm.rst, line 111.) 


Todo: There may be other switches. 


659 


nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv43- 
therm.rst, line 152.) 


Todo: Document reg 15b8 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/nv43- 
therm.rst, line 215.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/ch: 
line 8.) 


Todo: check the frequency at which PDAEMON is polling 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/coi 
line 33.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/ep 
line 9.) 


Todo: and unknown stuff. 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/inc 
line 53.) 


Todo: figure out additions 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/inc 
line 65.) 


Todo: this file deals mostly with GT215 version now 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/inc 
line 67.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 9.) 


Todo: reset doc 
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(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 74.) 


Todo: unknown у3+ regs at 0х430-- 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 75.) 


Todo: 5с0- 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 76.) 


Todo: 6604 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 77.) 


Todo: finish the list 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/io. 
line 78.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/per 
line 7.) 


Todo: discuss mismatched clock thing 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/per 
line 9.) 


Todo: figure out the first signal 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/per 
line 34.) 


Todo: document MMIO * signals 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/per 
line 35.) 


Todo: document INPUT *, OUTPUT * 


661 


nVidia Hardware Documentation, Release git 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/per 
line 36.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/sig 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/sig 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/sig 
line 23.) 


Todo: figure out bits 7, 8 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/sul 
line 25.) 


Todo: more bits in 10-12? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/sul 
line 26.) 


Todo: what could possibly use РОАЕМОХ 5 busy status? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/pdaemon/use 
line 17.) 


Todo: check the possible dividers 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 
line 80.) 


Todo: verify the priorities of each threshold (if two thresholds are active at the same time, which one is considered 
as being active?) 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 
line 132.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 


line 138.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 


line 143.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 


line 147.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 


line 155.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/pm/ptherm.rst, 


line 163.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 


line 15.) 


Todo: status bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 


line 87.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 


line 97.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 
line 99.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvcomp.rst 
line 107.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 15.) 


Todo: status bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 77.) 


Todo: interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 78.) 


Todo: MEMIF ports 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 79.) 


Todo: core clock 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 80.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 
line 90.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvdec.rst, 


line 92.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 15.) 


Todo: status bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 100.) 


Todo: interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 101.) 


Todo: MEMIF ports 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 102.) 


Todo: core clock 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 103.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 113.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 


line 117.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/pvenc.rst, 
line 119.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/index.1 
line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/macro. 
line 96.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/macro. 
line 104.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/macro. 
line 112.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/macro. 
line 174.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pbsp.r: 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pbsp.r: 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pbsp.r: 
line 23.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pbsp.r: 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pciphe 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pciphe 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pciphe 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pciphe 


line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pvp2.r 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pvp2.r 


line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pvp2.r 


line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/pvp2.r 


line 31.) 
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Todo: width/height max may be 255? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 42.) 


Todo: reg 0x00800 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 94.) 


Todo: what macroblocks are stored, indexing, tagging, reset state 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 171.) 


Todo: and availability status? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 187.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 201.) 


Todo: RE and write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 225.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 233.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 243.) 


Todo: more inferred crap 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 451.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/vld.rst, 
line 496.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp2/xtensa. 
line 5.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/index.1 
line 21.) 


Todo: Verify whether X or Y is in the lowest 16 bits. I assume X 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/mbring 
line 118.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 15.) 


Todo: interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 137.) 


Todo: more MEMIF ports? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 138.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 148.) 
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Todo: unknowns 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 173.) 


Todo: fix list 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 174.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 180.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/ppdec. 
line 188.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 
line 15.) 


Todo: interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 
line 123.) 


Todo: more МЕМІЕ ports? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 
line 124.) 


Todo: status bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 
line 125.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 135.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 157.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 165.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 173.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 179.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 187.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 195.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 203.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 209.) 
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Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 217.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 225.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 233.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 239.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 247.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 256.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 264.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 270.) 


Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 278.) 
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Todo: write 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pppp.r 


line 286.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 15.) 


Todo: clock divider in 1530? 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 103.) 


Todo: find out something about the GM107 version 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 105.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 115.) 


Todo: update for GM107 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/psec.r: 


line 129.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 


line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 


line 15.) 
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Todo: interrupts 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 70.) 


Todo: VM engine/client 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 71.) 


Todo: MEMIF ports 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 72.) 


Todo: status bits 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 73.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 83.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvdec. 
line 92.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 
line 15.) 


Todo: MEMIF ports 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 
line 130.) 
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Todo: unknowns 


(The original entry is located іп /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 153.) 


Todo: fix list 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 154.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 162.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 172.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 174.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 182.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vp3/pvld.r: 


line 190.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/index.r 


line 19.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 13.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 22.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 58.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 62.) 


Todo: figure these out 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 68.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 74.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 80.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 87.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 94.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 101.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 108.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 117.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 124.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 130.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 136.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 142.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 148.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 154.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 


line 160.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 166.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 172.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 178.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 184.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 190.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 196.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 202.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pme/ir 
line 210.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pmpeg 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pmpeg 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pmpeg 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pmpeg 
line 31.) 


Todo: list me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp1/a 
line 144.) 


Todo: complete the list 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp1/a 
line 190.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/b 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/b 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/b 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp l/d 
line 9.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp l/d 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp l/d 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/fi 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/fi 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/fi 
line 23.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/fi 
line 31.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 42.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 50.) 


Todo: write me 


(The original entry is located іп /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 56.) 
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Todo: incomplete for «G80 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 130.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 154.) 


Todo: mov from $sr, $uc, $mi, $f, $d 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpl/i 
line 224.) 


Todo: some unused opcodes clear $c, some don't 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/5 
line 238.) 


Todo: figure out the pre-G80 register files 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvp 1/5 
line 351.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpe.r: 
line 9.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpe.r: 
line 15.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpe.r: 
line 25.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vpe/pvpe.r: 
line 33.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/intro.rs 
line 147.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/isa.rst, 
line 979.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/isa.rst, 
line 1130.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/isa.rst, 
line 1136.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/perf.rst 
line 11.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/vpring. 
line 11.) 


Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/vreg.rs 
line 15.) 


Todo: the following information may only be valid for H.264 mode for now 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/vreg.rs 
line 21.) 


Todo: recheck this instruction on VP3 and other codecs 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/hw/vdec/vuc/vreg.rs 
line 228.) 
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Todo: write me 


(The original entry is located in /home/docs/checkouts/readthedocs.org/user builds/envytools/checkouts/latest/docs/nvrm/pmu/ucode- 
cmds.rst, line 13.) 
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684 Chapter 5. TODO list 


СНАРТЕВ Ө 


Indices and tables 


* genindex 


* search 
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686 Chapter 6. Indices and tables 


Index 


Symbols 


=F 


<feature> 
command line 
<mapfile> 
command line 
<mode> 
command line 
<stride> 
command line 
<variant> 
command line 
command line 
command line 
<base> 
command line 
<discard> 
command line 
command line 
<limit> 
command line 
<machine> 
command line 
command line 
<filename> 
command line 
command line 
<value> 
command line 
command line 


option, 545 
option, 546 
option, 546 
option, 546 


option, 544 


option, 543, 547 


option, 547 
option, 544 


option, 544 


option, 543, 547 


option, 544 
option, 544 
option, 547 
option, 546 
option, 547 


option, 546 


option, 543, 547 


command line option 
-F <feature>, 545 


-M «mapfile»,546 
-O «mode», 546 

-S «stride»,546 
-V «variant»,544 


-W, 543, 547 
-а, 547 


-р «base»,544 
-d «discard», 544 


-1, 543, 547 


-1 <limit>, 544 
-m <machine>, 544 


-n, 547 


-о «filename», 546 


-а, 547 


-u «value»,546 


-м, 543, 547 
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